gausby / tortoise

A MQTT Client written in Elixir
Apache License 2.0
312 stars 54 forks source link

errors when restarting broker #121

Open heinsaris opened 4 years ago

heinsaris commented 4 years ago

Hi,

I get some strange behaviour when I stop and restart my broker while there is an active Tortoise connection. I use Tortoise 0.9.4 on a MBP running Macos 10.15.3 (Catalina) and Elixir 1.10.0 on top of Erlang 22.2.4 (both installed via brew).

This is how the connection is set up (As a child of the app supervisor:

{Tortoise.Connection, [ client_id: "hein-1", server: { Tortoise.Transport.Tcp, host: "localhost", port: 1883 }, handler: {Tortoise.Handler.Logger, []} ]}

I run a producer and a consumer in separate processes. The producer adds messages to a queue (BlockingQueue) , and the consumer takes the messages from the queue and publishes them via publish_sync. I tried with publish and a receive..after, but I got even worse problems (some of my messages were lost).

Here is what happens:

When the broker is stopped, I immediately get a warning from the logger:

14:51:38.193 [warn] Connection has been dropped

I can keep publishing (for minutes even) , but as soon as I restart the broker, I get an error:

14:51:50.926 [error] GenStateMachine {Tortoise.Registry, {Tortoise.Connection.Inflight, "hein-1"}} terminating ** (exit) {:protocol_violation, :unknown_identifier} (stdlib 3.11.1) gen_statem.erl:1298: :gen_statem.loop_state_callback_result/11 (stdlib 3.11.1) proc_lib.erl:249: :proc_lib.init_p_do_apply/3 State: {{:connected, {Tortoise.Transport.Tcp, #Port<0.16>}}, %Tortoise.Connection.Inflight{client_id: "hein-1", order: [], pending: %{}}} Callback mode: :handle_event_function

After the timeout on publish_sync I get this:

14:52:50.914 [info] answer = {:error, :timeout}

And I try to publish the message again:

14:52:50.914 [info] publish message Message 4

But get another error:

14:52:50.922 [error] GenStateMachine {Tortoise.Registry, {Tortoise.Connection.Inflight, "hein-1"}} terminating ** (CaseClauseError) no case clause matching: {:error, :closed} (tortoise 0.9.4) lib/tortoise/connection/inflight.ex:292: Tortoise.Connection.Inflight.handle_event/4 (stdlib 3.11.1) gen_statem.erl:1161: :gen_statem.loop_state_callback/11 (stdlib 3.11.1) proc_lib.erl:249: :proc_lib.init_p_do_apply/3

Then I get this warning:

14:52:55.912 [warn] Client has been terminated with reason: {:function_clause, [{Tortoise.Handler.Logger, :connection, [:terminated, %Tortoise.Handler.Logger{}], [file: 'lib/tortoise/handler/logger.ex', line: 16]}, {Tortoise.Handler, :execute, 2, [file: 'lib/tortoise/handler.ex', line: 314]}, {Tortoise.Connection.Controller, :handle_info, 2, [file: 'lib/tortoise/connection/controller.ex', line: 215]}, {:gen_server, :try_dispatch, 4, [file: 'gen_server.erl', line: 637]}, {:gen_server, :handle_msg, 6, [file: 'gen_server.erl', line: 711]}, {:proc_lib, :init_p_do_apply, 3, [file: 'proc_lib.erl', line: 249]}]}

And this error:

14:52:55.920 [error] GenServer {Tortoise.Registry, {Tortoise.Connection.Controller, "hein-1"}} terminating ** (FunctionClauseError) no function clause matching in Tortoise.Handler.Logger.connection/2 (tortoise 0.9.4) lib/tortoise/handler/logger.ex:16: Tortoise.Handler.Logger.connection(:terminated, %Tortoise.Handler.Logger{}) (tortoise 0.9.4) lib/tortoise/handler.ex:314: Tortoise.Handler.execute/2 (tortoise 0.9.4) lib/tortoise/connection/controller.ex:215: Tortoise.Connection.Controller.handle_info/2

Then comes a ping timeout:

14:52:55.922 [error] GenServer {Tortoise.Registry, {Tortoise.Connection, "hein-1"}} terminating ** (stop) :ping_timeout Last message: :ping State: %Tortoise.Connection{backoff: %Tortoise.Connection.Backoff{max_interval: 30000, min_interval: 100, value: nil}, client_id: "hein-1", connect: %Tortoise.Package.Connect{META: %Tortoise.Package.Meta{flags: 0, opcode: 1}, clean_session: false, client

After 4 errors, the Tortoise client is terminated (by or via the supervisor), and a new one is created, after which everything is fine again.

I have lost no messages, but the messages that was being published at the time the broker is restarted gets published more than once.

The question is whether this is a bug in Tortoise, or am I doing something wrong? Or maybe this is what is supposed to happen and is a good example of the let it fail philosophy?

gausby commented 4 years ago

Looks like the server send a package with an identifier unknown to the tortoise instance…what broker are you using?

heinsaris commented 4 years ago

Hello Martin,

Thank you for your fast response.

I am running mosquitto on my Mac. It is the latest version, installed via brew.

Kind regards,

Hein On 31 January 2020 at 18:04:52, Martin Gausby (notifications@github.com) wrote:

Looks like the server send a package with an identifier unknown to the tortoise instance…what broker are you using?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.