dotnet / MQTTnet

MQTTnet is a high performance .NET library for MQTT based communication. It provides a MQTT client and a MQTT server (broker). The implementation is based on the documentation from http://mqtt.org/.
MIT License
4.45k stars 1.06k forks source link

MQTTnet Server losing messages #507

Closed dealproc closed 5 years ago

dealproc commented 5 years ago

as part of https://github.com/chkr1011/MQTTnet/issues/506, I am trying to diagnose why I am losing messages within the server component after a period of inactivity. I've monitored logging from the Orleans infrastructure, and have ruled out any possibility that Orleans can be to blame. Can someone please help me understand why the queue within the ClientSesssionManager is a blocking collection as compared to a ConcurrentQueue? Can this blocking collection somehow drop items within it? Are there facilities to report errors from within the server framework?

chkr1011 commented 5 years ago

It is a blocking collection because there is a dedicated thread dispatching the messages. It must be a single thread to ensure message order etc. A queue is not suitable because it will not block if there are no messages. A delay can be introduced but then the delay is not interrupted if a message arrives. The blocking collection will signal the worker thread immediately.

I never saw dropped messages in such a list. Are you able to reproduce this issue within a (new) UnitTest?

dealproc commented 5 years ago

Working on it. As I am learning MQTT protocol even more, I'm factoring and refactoring into more of the proper way of using the protocol. The "lost messages" may be that the device itself was disconnected under the mqtt protocol, but the underlying tcp/webSocket connection is still live and well.

I had been dependent on the tcp/webSocket disconnect event to tell me when the device was online/offline, but apparently the LWT is the best way to enforce this.

With that said, I need to refactor the device to now use the LWT message to update the UI Presence indicator for online/offline in favor of using the events on the managed mqtt client.

dealproc commented 5 years ago

I think I may have discovered my root issue. I've been noticing that when I have the android device plugged in, I'm not losing messages... but when the device is running on battery, everything server-side is now working as-is expected, but the devices themselves are not getting the messages from the server.

I am testing using both a Partial Wakelock as well as a Wifi Lock with the application, and preliminary testing seems to be promising. I will know more in a few hours, but I think we need to add an article for android development to let folks know when configuring the managed and/or straight mqtt client, that they need to acquire both a partial wakelock and a wifi lock so that their application can continue to communicate with the server.

dealproc commented 5 years ago

Curious to find out. I am using webSockets with mqtt for both client and server, and have multiple back-end server instances running. Could a root cause be if we are not doing some sort of session affinity with a proper cookie in play? I noticed while reading the code for the webSocket client configuration that there was no CookieContainer being set. I know, at least with SignalR, the expected use-case is to use sticky sessions using a cookie, and I would presume we would want to do the same with MQTT on webSockets.

What I do not know is two things: 1) When should the client's cookie container be flushed? 2) What happens when the back-end server is no longer available?

dealproc commented 5 years ago

I think if we just initialize this parameter, I can get past my issue. I'm looking through the v2.8.5 codebase to see if there is a way on my side to initialize this property...

https://github.com/chkr1011/MQTTnet/blob/ba62ca73064093fa4868a10c19c256bb344920c9/Source/MQTTnet/Client/MqttClientWebSocketOptions.cs#L14

SeppPenner commented 5 years ago

Is this still relevant @dealproc?

SeppPenner commented 5 years ago

I'm closing this due to inactivity. Feel free to re-open this if needed.