Fix: MQTTAbstractionImpl: flush/sync MQTT Connect message immediately

EVerest / everest-framework

Apache License 2.0

21 stars 19 forks source link

Fix: MQTTAbstractionImpl: flush/sync MQTT Connect message immediately #208

Closed barsnick closed 4 weeks ago

barsnick commented 1 month ago

Previously, a socket was opened, but the MQTT Connect command message was not flushed to the broker until init() was finished. This could be critical if a module's init() phase waits for some reason, as the MQTT broker's default socket timeout might disconnect the client.

This behavior was observed with the mosquitto broker, where unregistered sockets were consistently disconnected after 94 seconds.

Insert an additional mqtt_sync() directly after mqtt_connect(), while the mqtt_sync() thread is not yet started.

Co-authored-by: Fabian Hartung fabian.hartung@chargebyte.com

mhei commented 1 month ago

Is calling the on_mqtt_disconnect() callback reasonable in the error path at this point? I'm wondering about the API contract regarding the callbacks call sequence, i.e. I would expect that on_mqtt_disconnect() is only called once on_mqtt_connect() was called before. Disclaimer: I did not dig deep into the whole code base so this comment might be stupid.

FaHaGit commented 1 month ago

We also stumbled across the point of whether we should call on_mqtt_disconnect(). Now the same error handling is implemented as in the spawn_main_loop_thread() method, so it should behave the same as before. But I admit a disconnect without a connect first is a bit confusing.

FaHaGit commented 1 month ago

I have now removed the disconnect() call, as no disconnection should occur unless an on_connect() call has been made. We also considered calling spawn_main_loop_thread() before the callback of the init() method, but that would be a bigger change and may influence the current behaviour. We see a higher risk here that MQTT messages could be lost if a module publishes messages within the init() method.

mhei commented 1 month ago

LGTM now

a-w50 commented 1 month ago

I have now removed the disconnect() call, as no disconnection should occur unless an on_connect() call has been made. We also considered calling spawn_main_loop_thread() before the callback of the init() method, but that would be a bigger change and may influence the current behaviour. We see a higher risk here that MQTT messages could be lost if a module publishes messages within the init() method.

Don't be afraid of bigger changes, otherwise they grow even bigger :)

FaHaGit commented 1 month ago

Thank you for the feedback @a-w50! I have adjusted the code accordingly. If it is ok we would stay with the initial solution for now. We can also prepare a separate pull request to try out the other (more cleaner) solution.

a-w50 commented 1 month ago

Thank you for the feedback @a-w50! I have adjusted the code accordingly. If it is ok we would stay with the initial solution for now. We can also prepare a separate pull request to try out the other (more cleaner) solution.

I think it is ok for now. Feel free to create an issue for the follow up work. Both connectBroker function overloads should be refactored anyway, as they contain quite some code duplication.

FaHaGit commented 1 month ago

I have created the issues 213 regarding the open points. Should I squash & merge the pull request now, or will you take over when the time is right regarding the release process? Thank you