eclipse / paho.mqtt.c

An Eclipse Paho C client library for MQTT for Windows, Linux and MacOS. API documentation: https://eclipse.github.io/paho.mqtt.c/
https://eclipse.org/paho
Other
1.95k stars 1.09k forks source link

MQTTAsync automatic reconnect can succeed after `MQTTAsync_disconnect()` is called #1399

Open chillipino opened 1 year ago

chillipino commented 1 year ago

when automaticReconnect is enabled and MQTTAsync_disconnect() is called after an unexpected disconnect, but before a reconnect succeeds, reconnect attempts will still happen in the background. this can be verified by connecting, sleeping for a few seconds to give you time to manually stop the broker, disconnecting after the sleep, and manually starting the broker again, then waiting until the reconnect succeeds. i suspect this isn't an issue for many usage patterns because their client lifetime is similar to their connection lifetime. i ran into this because i'm keeping a long-lived client around and connecting/disconnecting as necessary; i don't need to structure it this way, but it would be nice if this pattern worked.

i've locally stopped the retries in MQTTAsync_disconnect1() by setting m->retrying to 0 in the if (!internal) block. even with this patch, a connect attempt that starts just before disconnecting can still succeed after the disconnect call - adding synchronization wouldn't help this case. what's the best way to respond to this? i'm somewhat new to MQTT + paho, so i'm not sure if there would be undesirable changes to the server's persisted session state from the original unexpected disconnect if we were to respond with a "normal" DISCONNECT when the client doesn't think it's connected.

this is on windows using current head of master at the time of writing ( https://github.com/eclipse/paho.mqtt.c/tree/7db21329301b1f527c925dff789442db3ca3c1e7 ). i've added a few of my own patches and i'm using a custom build system, but i don't think any of that is causing the issue given the logic i've traced through. i'm still implementing my first non-toy paho integration and saw this behavior while testing how automatic reconnect works. in case it matters, i'm using MQTT 5.0 only with the paho.mqtt.testing broker for now with the intent to switch to mosquitto at some point.

icraggs commented 11 months ago

Calling disconnect set shouldBeConnected to 0, so a new reconnect attempt won't happen, but if a reconnect attempt is already in process, then it could succeed, is my analysis.

Now I've looked a bit further, it looks like MQTTAsync_checkTimeouts doesn't check shouldBeConnected, and so the next reconnect attempt will be started. That could address your situation.

chillipino commented 11 months ago

thanks for looking at this - yes, in my patch i chose to set m->retrying, which MQTTAsync_checkTimeouts() was already checking, rather than add a check for m->shouldBeConnected.

i've confirmed that the change you suggested does stop retries after the one that's in flight (this is from my fork which has other changes; line numbers may have shifted):

diff --git a/src/MQTTAsyncUtils.c b/src/MQTTAsyncUtils.c
index ca84168..3afabda 100644
--- a/src/MQTTAsyncUtils.c
+++ b/src/MQTTAsyncUtils.c
@@ -1711,7 +1711,7 @@ static void MQTTAsync_checkTimeouts(void)
         * In any case, that section was disabled when automatic reconnect was implemented.
         */

-       if (m->automaticReconnect && m->retrying)
+       if (m->automaticReconnect && m->shouldBeConnected && m->retrying)
        {
            if (m->reconnectNow || MQTTTime_elapsed(m->lastConnectionFailedTime) > (ELAPSED_TIME_TYPE)(m->currentInterval * 1000))
            {

in the meantime i've traced the reconnect logic a bit more; if MQTTAsync_connect() was called with cleanstart enabled, the automatic reconnect will re-use this value and stomp the session. would it be possible to add cleanstart to MQTTAsync_connectData so it can be switched off in the MQTTAsync_updateConnectOptions() callback? i have not looked at how cleansession works with automatic reconnects in detail (i'm only using MQTT 5.0), but i'd guess that would need to be involved somehow as well. this change would be useful beyond the "unexpected connect" edge case. this is something i noticed and not something i'm planning to write myself, so it could go on a backlog or a "projects for contributors" list.