eclipse-paho / paho.mqtt.python

paho.mqtt.python
Other
2.21k stars 722 forks source link

on_disconnect Callback Triggered Twice When Connection is Lost #870

Open sfczsolti opened 3 days ago

sfczsolti commented 3 days ago

When testing the example code client_session_present.py (from Paho MQTT Python GitHub Repository), I encountered an issue where the on_disconnect callback is triggered twice when the network connection is lost. This behavior seems unexpected, as it could cause redundancy and potentially unwanted behavior in production applications.

I modified the example minimally to include on_disconnect and on_log callbacks to monitor connection status.

import paho.mqtt.client as mqtt

def on_connect(mqttc, obj, flags, reason_code, properties):
    print("Session present: " + str(flags.session_present))
    print("Connection result: " + str(reason_code))

def on_disconnect(mqttc, obj, flags, reason_code, properties):
    print('Disconnected')

def on_log(mqttc, obj, level, string):
    print(string)

mqttc = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2, client_id="asdfj", clean_session=False)
mqttc.on_connect = on_connect
mqttc.on_disconnect = on_disconnect
# Uncomment to enable debug messages
# mqttc.on_log = on_log
mqttc.user_data_set(0)
mqttc.connect("mqtt.eclipseprojects.io", 1883, 10)

mqttc.loop_forever()

When I simulate a network disconnection, the following output is generated:

Session present: True
Connection result: Success
Disconnected
Disconnected
Session present: True
Connection result: Success

As shown above, Disconnected is printed twice, indicating that the on_disconnect callback is called twice.

Environment:

Paho MQTT Python Client Version: 2.1.0 Python Version: 3.10

Request:

Could you please investigate why on_disconnect is being triggered twice? Any insights or fixes would be greatly appreciated.

JamesParrott commented 3 days ago

These are the two routes on_disconnect is being called from:

Disconnected, called by: [('on_disconnect', 15), ('_do_on_disconnect', 4383), ('_check_keepalive', 3294), ('loop_misc', 2149), ('_loop', 1707), ('loop_forever', 2297), ('<module>', 30)]
Disconnected, called by: [('on_disconnect', 15), ('_do_on_disconnect', 4383), ('loop_misc', 2163), ('_loop', 1707), ('loop_forever', 2297), ('<module>', 30)]

The first call is on this line. the second one is 14 lines below: https://github.com/eclipse-paho/paho.mqtt.python/blob/d45de3737879cfe7a6acc361631fa5cb1ef584bb/src/paho/mqtt/client.py#L2149

I'm not sure what the best fix is. _check_keepalive could return something more useful than None, the value of which loop_misc could use to either return early, or test and avoid the second call to _do_on_disconnect. But I don't know what else depends on the current flow or if other users' code depends on the double call, e.g. if there are different reason codes in other situations than a time out from switching my wifi off.

Adjusted test script:

import paho.mqtt.client as mqtt

import inspect

def on_connect(mqttc, obj, flags, reason_code, properties):
    print("Session present: " + str(flags.session_present))
    print("Connection result: " + str(reason_code))

def on_disconnect(mqttc, obj, flags, reason_code, properties):
    caller_names_and_lines = [(finfo.frame.f_code.co_name, finfo.frame.f_lineno) 
                               for finfo in inspect.stack()] 
    print(f'Disconnected, called by: {caller_names_and_lines}')

def on_log(mqttc, obj, level, string):
    print(string)

mqttc = mqtt.Client(mqtt.CallbackAPIVersion.VERSION2, client_id="asdfj", clean_session=False)
mqttc.on_connect = on_connect
mqttc.on_disconnect = on_disconnect
# Uncomment to enable debug messages
# mqttc.on_log = on_log
mqttc.user_data_set(0)
mqttc.connect("mqtt.eclipseprojects.io", 1883, 10)

mqttc.loop_forever()
sfczsolti commented 3 days ago

Thank you for your response!

In my development environment (currently on Windows), I indeed tested by manually disconnecting WiFi. Without other options, this seemed like the simplest approach—though I suppose I could have unplugged the LAN cable from the router as well. I’ll make sure to run the suggested code snippet next time I test to better understand what may be causing the double on_disconnect calls.

In the final production environment, this code will run on Raspberry Pi devices with Debian OS, where connection issues are likely to happen frequently due to weak cellular signals.

If I understand correctly, in the current implementation, the on_disconnect callback is triggered twice when a disconnection occurs. (I hope I’m interpreting this correctly.)

My questions are:

1. Could the duplicate on_disconnect calls cause issues? Is it possible that this could lead to unexpected behavior ?Or I can use it like this, it won't cause an error. perhaps a not very elegant solution:

def on_connect(self, client, userdata, flags, reason_code, properties=None):
        print("Connected to MQTT broker")
        self.disconnect_called = False  # Reset flag on successful connect

    def on_disconnect(self, client, userdata, reason_code, properties=None):
        if not self.disconnect_called:
            self.disconnect_called = True  # Prevent future on_disconnect calls
            print("Disconnected from MQTT broker")

2. Is there a more official or reliable way to detect disconnections? Since disconnections will likely happen often in the production environment, it’s important for me to have a clear way to detect this state and halt unnecessary publish calls.

Any further guidance would be greatly appreciated!

JamesParrott commented 2 days ago

I don't know for certain they won't cause issues, and I can't speak officially on behalf of anyone.

But if this is a concern for you or anyone else, the problem can be solved far more easily and quickly than by waiting for a fix in paho.mqtt, with just a couple of extra lines of Python:

make the client code in on_disconnect idempotent.