Drolla / WavePlus_Bridge

Airthings Wave Plus Bridge to Wifi/LAN and MQTT
MIT License
21 stars 9 forks source link

Messages published to the LWT topic should be retained #15

Closed rkoshak closed 1 year ago

rkoshak commented 1 year ago

I'm very happy to see the LWT topic being utilized. However, the messages are not retained. Consequently, if the subscribers and other MQTT clients who want to know if the WavePlus_Bridge is online or not needs to actively be online and subscribed to the status topic. If not, when they connect they won't know the current status.

https://github.com/rkoshak/sensorReporter/blob/main/mqtt/mqtt_conn.py#L106 shows how to configure the LWT message to be retained.

The fact that it publishes Online repeatedly helps as the client will know when it's Online, but it won't be certain that it's Offline if that's the case.

In truth, because you publish the update_time and everything is a sensor reading, it wouldn't be awful if all the messages published are retained. As a rule to thumb, I always set sensor readings and statuses to retained and commands as not retained. That way the clients don't have to wait minutes before they can start working. And if they care about how old the data is, they can check the timestamp.

This isn't a huge deal but it would make it work a little bit better on the client's side.

Drolla commented 1 year ago

Thanks for educating me regarding the 'retain' attribute of the MQTT message, @rkoshak!

The fix for the request to retain the LWT has been implemented in branch "15-lwt-topic-should-be-retained". This branch implements also your recommendation to retain all the sensor data and timestamps.

However, the topic 'status' is not retained if it is 'online'. It is only retained if it is switched to 'offline' when the application ends, or if it is set to 'connection lost' set by the LWT. Does this makes sense to you, @rkoshak . Or would you also always retain the 'status' topic?

rkoshak commented 1 year ago

I could see it either way given the current implementation.

In my systems typically the ONLINE message is only published once, after successfully connecting to the broker and it's published as retained (I publish it in the on_connect callback function registered with the client Object). The OFFLINE message is published by the broker as the LWT, again retained. I personally like that approach because it minimizes the message traffic and any client will know immediately if the bridge is ONLINE or OFFLINE.

Without a retained ONLINE message, the client will have to wait up to the refresh rate before it knows one way or the other if the service is working or not. That's not the worst thing in the world but it might cause some extra work on the clients to have to wait a bit before freaking out that the bridge is down. Or when they first reconnect to the broker they will erroneously think the bridge is down and need to take remedial actions.

The "connection lost" message is something I'm still pondering. If I understand correctly, that is published if connection between the bridge and one(?) of the AirThings is lost (e.g. one of the AirThings ran out of battery). If that's the case, I'd recommend moving that to be a status under the device instead of a global level status. Just because you've lost connection to one device doesn't mean the bridge is malfunctioning nor that all devices have lost the connection. And if one device is down and another one is up and reporting, that status doesn't really tell the client which one is down, just that one or more is down.

So I would create a status or connectionStaus topic under each device and publish ONLINE/OFFLINE (or the messages of your choice) there. Then you can have the global level status just reflect the MQTT connection status, or only publish "connection lost" there if the connection to all the AirThings is lost, though even here I might choose a different topic and leave the status topic strictly reflect the MQTT connection status.

In my case my client is openHAB and I'm able to configure it to monitor the LWT topic and mark the devices as OFFLINE when it sees the OFFLINE message. But it's currently all or nothing because of the shared status topic. If each device had it's own status, I could use that instead to get down to the specific device and only mark that one malfunctioning one as offline. Since battery level isn't reported that can help me more quickly track down which device ran out. Then the bridge and other WavePlus devices would remain online, providing a lot more information.

Drolla commented 1 year ago

In branch "15-lwt-topic-should-be-retained", the status topic has been implemented in the following way:

I like your suggestion that we have multiple status topics, one for the bridge, and one for each connected Airthings devices, as also the way you handle the main status topic. So, this would be the the topic structure and status behavior:

What do you think about these specs, @rkoshak ?

rkoshak commented 1 year ago

So, this would be the the topic structure and status behavior:

I like this approach a lot. It lets us get more detailed information about what's online/offline. I assume all the messages are retained.

It would be a little more work, but looking at the logs the script recognizes when one of the Wave+ devices connections fails and it retries three times (I think that's configurable). Does the script retain that it successfully connected on the previous poll or does each poll start over anew with no prior knowledge of what happened before?

If the former, maybe only publish the online message that first time after connecting/reconnecting successfully. But I suspect the latter in which case publishing ONLINE every time is a nice compromise between adding a bunch of bookkeeping logic to the polling to keep track of when the last time a successful connection was made just so it can know when it needs to publish the online message.

Drolla commented 1 year ago

Each poll is performed independently. The connection status is not carried over from one poll to the next. So the Airthings device status is set each poll iteration. If the connection could be established and sensor data read, the the status is set to 'Online', otherwise to 'Offline'. The bridge status is just set once the bridge connects to the broker. Any messages (status, sensor data, and publish/update time) are retained. The code has been committed/pushed, using still the branch "15-lwt-topic-should-be-retained".

Drolla commented 1 year ago

The branch with the changes "15-lwt-topic-should-be-retained" has been merged into the master branch. Considering this improvement as completed.

rkoshak commented 1 year ago

Sorry for not getting a chance to test it out before merging it. I've been ill and of course work piled up. Thanks for making the changes! This will make using it much easier going forward.

Drolla commented 1 year ago

Don't worry - I wish you a rapid recovery! I appreciated your support and hope the changes are helpful for some users of this application.