eclipse / paho.mqtt.python

paho.mqtt.python
Other
2.19k stars 722 forks source link

Fix loop_stop() not clearing ._thread when called from same thread #809

Closed jiachin1995 closed 6 months ago

jiachin1995 commented 8 months ago

When client.loop_stop() is called from within on_message() callback, client ._thread will not be set to None. This will cause subsequent loop_start() to return MQTT_ERR_INVAL error.

jiachin1995 commented 8 months ago

Context: Our use case was for PahoMQTT to simulate an IoT device. The device can be remotely rebooted/shutdown. If we try to call loop_stop() from within the PahoMQTT thread. It stops the thread but subsequent loop_start() stops working as client._thread is not None.

For shutdown, this pull request is necessary to address the issue directly without creating a workaround.

For restart, we have to call reboot from a different thread. For example:

# mydevice.reboot
def reboot(self):
    self.paho_client.loop_stop()
    time.sleep(5)
    self.paho_client.loop_start()

# on_message callback
def on_message(client):
    . . .
    # restart cmd received 
    from threading import Thread

    Thread(target=mydevice.reboot).start()
PierreF commented 8 months ago

Hello, thanks for your contribution. For a contribution to be merged, you will need to sign the Eclipse ECA for your contribution to be merged. (https://accounts.eclipse.org/user/eca)

That being said, if I understand correctly your use-case, you do not want to (just) call loop_stop/loop_start. This will mostly do nothing, it do not disconnect from MQTT (so it don't simulate a device reboot). loop_stop only stop the network processing thread, and if you call loop_start quickly, you will only produce a short delay in network processing.

I'm not sure if could be valid to call loop_stop & loop_start within a callback. Because to call loop_start I believe it's required to have the thread stopped (or else it will be too complex regarding possible race-condition), and the thread won't stop until the callback returned.

Actually if you want to simulate de device reboot, you should also discard the Client state (like the possibly queued message), for that the best solution would be to re-create the Client(). For this the best solution IMO is doing something like:


def on_message(...):
   if should_simulate_device_reboot():
      # calling disconnect() will *cleanly* disconnect - so it don't simulate a device lost, but device clean reboot
      # but calling disconnect() will stop the loop_forever()
      client.disconnect() 

while simulator_should_run:
   # we re-create the mqtt.Client() each time loop_forever exit
   mqttc = mqtt.Client([...])
   mqttc.connect[...]
   mqttc.loop_forever()

Does this solve your problem ?

jiachin1995 commented 8 months ago

Hi Pierre, I have signed the ECA.

For restart, I have already assessed and agree that there is no meaningful way to implement it in PahoMQTT's on_message() callback. I simply wanted to provide some context for the error we faced. We are fine with creating a separate thread.

For shutdown, I believe it should be valid for loop_stop() to be called from on_message() callback. But there is currently an error without this pull request.

PierreF commented 6 months ago

Thank for your patience. I do have fear that your solution will introduce new bugs, mostly because the thread will still be running with self._thread == None. This might break some assumption in current code.

If I get correctly the aim of this fix (being able to call loop_stop() from on_message() and later calling loop_start()), shouldn't we prefer to set self._thread = None in a try/finally of _thread_main() ? This should fix the issue AND ensure the thread isn't running with self._thread == None ?

This should also make the "later" to call loop_start() possible to determine. It's when loop_start() don't fail with MQTT_ERR_INVAL. With your proposition I don't see a way to be sure that isn't two threads running.

jiachin1995 commented 6 months ago

Hi Pierre. I have updated the code as suggested. However, I am unsure where to insert a try/finally block.

Do let me know if any other changes are required