AtherEnergy / rumqtt

Pure rust mqtt cilent
The Unlicense
202 stars 71 forks source link

Ping logic disconnections with high volume of received messages and no sent messages #58

Closed ThorburnT closed 6 years ago

ThorburnT commented 7 years ago

ERROR:rumqtt::connection: PING error PingTimeout Is generated when receiving high volumes of mqtt messages ( >> 10/s ) at an interval approximately equal to the timeout, and ERROR:rumqtt::connection: At line 214 = Error in receiving packet Error { repr: Custom(Custom { kind: Other, error: StringError("unexpected EOF") }) } at a more unpredictable interval.

Steps to reproduce:

  1. Use rumqtt = "0.10.1"
  2. Attempt to receive very high volume of messages and not send any
  3. Receive Ping timeout and unexpected EOF errors periodically.

Best guess going from the source code comments is that the sending/receiving of ping packets do not get dealt with in a timely manner, causing either rumqtt to disconnect when it believes the broker has disconnected, and now and then the broker killing the TCP connection because it has not received a ping/message from us in a long time, causing an EOF error client side when we try to read the stream.

Possible solutions:

Separate pinging into its own thread, send received pings with a dedicated ping mpsc channel to the thread where it does not have to deal with any other logic that might eat up time and cause a timeout.

tekjar commented 7 years ago

@ThorburnT Thanks for the report. You are right. This might be due to broker disconnecting the client when it didn't recv PINGREQ in time

In the current design, writes are happening during read timeouts (which is not great) because of tls restrictions. If tls is enabled, reads and writes should happen on same thread.

Maintaining a different thread and sending ping messages on channel wouldn't work because ping writes might not happen during high frequency incoming messages (as discussed above, writes happen only during recv timeouts).

Tokio can solve these issues but I'm confused if I should wait for async-await (or) tokio v2

tekjar commented 7 years ago

@ThorburnT Is this happening with qos1 incoming publishes?

ThorburnT commented 7 years ago

@tekjar This was on QOS 0 incoming publishes. What would be the best way for me to locally fix the pinging if I don't need TLS? The Tokio v2 vs async problem is a tough one, but either one will probably work well. Also, thank you for this crate.

tekjar commented 7 years ago

@ThorburnT Quick way would be to perform ping logic after every recv or use qos1. Client will send puback for every incoming qos1 publish and broker would know that client is alive

tekjar commented 6 years ago

This should be fixed in tokio2 branch