astarte-platform / astarte

Core Astarte Repository
https://docs.astarte-platform.org/
Apache License 2.0
235 stars 44 forks source link

Message unacked when Trigger Engine failes to connect to the target #936

Open Pavinati opened 2 months ago

Pavinati commented 2 months ago

Observed first in Astarte 1.0.4, but a similar behavior is still present in Astarte 1.1.1

It was observed by installing a trigger with an HTTP action on a local kubernetes DNS service without specifing the port

    "action": {
        "http_url": "http://my-service.my-namespace.svc.cluster.local/triggers",
        "http_method": "post",

Without the port, the service could not be reached making Trigger Engine crash. The following log is from Astarte 1.1.1

May  6 14:03:02 |WARN| Error while processing the request: %HTTPoison.Error{reason: :timeout, id: nil}. Payload: "{\"device_id\":\"_T90lEhyVYuUT6irpoDEFA\",\"event\":{\"interface\":\"my.Interface\",\"path\":\"/foo\",\"type\":\"incoming_data\",\"value\":\"bar \"},\"timestamp\":\"2024-05-06T14:02:54.640Z\"}", headers: ["Astarte-Realm": "test", "Content-Type": "application/json"], action: %{"http_method" => "post", "http_static_headers" => %{"Astarte-Trigger-Token" => "e87gydbfTPvXZZRaNwbJy3701ypdgigbXQ=="}, "http_url" => "http://my-backend.test.svc.cluster.local/triggers", "ignore_ssl_errors" => false} function=execute_action/3 module=Astarte.TriggerEngine.EventsConsumer
May  6 14:03:02 |ERRO| GenServer {Registry.PolicyRegistry, {"test", "@default"}} terminating
** (CaseClauseError) no case clause matching: {:error, :connection_error}
    (astarte_trigger_engine 1.1.1) lib/astarte_trigger_engine/policy/policy.ex:72: Astarte.TriggerEngine.Policy.handle_cast/2
    (stdlib 4.3.1) gen_server.erl:1123: :gen_server.try_dispatch/4
    (stdlib 4.3.1) gen_server.erl:1200: :gen_server.handle_msg/6
    (stdlib 4.3.1) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
eddbbt commented 1 month ago

connection error for 1.1.1 handled in https://github.com/astarte-platform/astarte/pull/947 Unable to reproduce the behaviour for 1.0.4 ( tested in astarte 1.0.6 rc and 1.0.4 ) , maybe it has been already fixed? rabbitmq seems to ack messages even when not delivered due to a connection error (from rabbit dashboard), so following astarte specs, and trigger-engine does not crash