eclipse-archived / smarthome

Eclipse SmartHome™ project
https://www.eclipse.org/smarthome/
Eclipse Public License 2.0
862 stars 783 forks source link

[mqtt-things] Homie devices go to offline with the "Device did not send heartbeat in time" message #6495

Closed bodiroga closed 5 years ago

bodiroga commented 5 years ago

Hi again @davidgraeff!

I think that I have found a new issue in the Homie convention implementation, because my devices go to Offline with the "Device did not send heartbeat in time" message even if they are working as they should (at least according to the Homie 3.0.1 specification).

Configuration of my Homie device (it is a Python script):

The tests have been done with openHAB 2.4.0-SNAPSHOT Build #1419, Eclipse SmartHome MQTT Thing Binding (0.10.0.201811100937) and the Eclipse SmartHome Embedded MQTT Broker (0.10.0.201811100937), everything running in my Laptop with Ubuntu 16.04.

You can find the latest version of my script here:

import random
import time
import paho.mqtt.client as mqtt

mqtt_broker_host = "localhost"
mqtt_broker_port = 1883
mqtt_client_id = "homie_3.0_temperature_sensor"
mqtt_client_keepalive = 30

homie_prefix = "homie"
homie_version = "3.0.0"
homie_device_id = "homie-3-temperature-sensor"
homie_device_name = "Homie 3.0 Temperature sensor"
homie_device_ip = "192.168.1.150"
homie_device_mac = "DE:AD:BE:EF:FE:ED"
homie_device_nodes = ["temperature"]
homie_device_state = None

def publish_homie_version(client):
    topic = "{}/{}/{}".format(homie_prefix, homie_device_id, "$homie")
    client.publish(topic, homie_version, 1, True)

def advertize_device():
    client.publish("homie/homie-3-temperature-sensor/$state", "init", 1, True)
    publish_device_info()
    publish_node_info()
    client.publish("homie/homie-3-temperature-sensor/$state", "ready", 1, True)

def publish_device_info():
    client.publish("homie/homie-3-temperature-sensor/$homie", homie_version, 1, True)
    client.publish("homie/homie-3-temperature-sensor/$name", homie_device_name, 1, True)
    client.publish("homie/homie-3-temperature-sensor/$localip", homie_device_ip, 1, True)
    client.publish("homie/homie-3-temperature-sensor/$mac", homie_device_mac, 1, True)
    client.publish("homie/homie-3-temperature-sensor/$fw/name", "sensor-firmware", 1, True)
    client.publish("homie/homie-3-temperature-sensor/$fw/version", "1.0.0", 1, True)
    client.publish("homie/homie-3-temperature-sensor/$nodes", "temperature", 1, True)
    client.publish("homie/homie-3-temperature-sensor/$implementation", "homie-python", 1, True)
    client.publish("homie/homie-3-temperature-sensor/$stats", "uptime", 1, True)
    client.publish("homie/homie-3-temperature-sensor/$stats/interval", "60", 1, True)

def publish_node_info():
    client.publish("homie/homie-3-temperature-sensor/temperature/$name", "Indoor temperature", 1, True)
    client.publish("homie/homie-3-temperature-sensor/temperature/$type", "Temperature", 1, True)
    client.publish("homie/homie-3-temperature-sensor/temperature/$properties", "temperature,thermostat,mode,button", 1, True)

    client.publish("homie/homie-3-temperature-sensor/temperature/temperature", 23, 1, True)
    client.publish("homie/homie-3-temperature-sensor/temperature/temperature/$name", "Indoor temperature", 1, True)
    client.publish("homie/homie-3-temperature-sensor/temperature/temperature/$settable", "false", 1, True)
    client.publish("homie/homie-3-temperature-sensor/temperature/temperature/$unit", "ºC", 1, True)
    client.publish("homie/homie-3-temperature-sensor/temperature/temperature/$datatype", "float", 1, True)
    client.publish("homie/homie-3-temperature-sensor/temperature/temperature/$format", "-20:120", 1, True)

    client.publish("homie/homie-3-temperature-sensor/temperature/thermostat", 23, 1, True)
    client.publish("homie/homie-3-temperature-sensor/temperature/thermostat/$name", "Indoor Thermostat", 1, True)
    client.publish("homie/homie-3-temperature-sensor/temperature/thermostat/$settable", "true", 1, True)
    client.publish("homie/homie-3-temperature-sensor/temperature/thermostat/$unit", "ºC", 1, True)
    client.publish("homie/homie-3-temperature-sensor/temperature/thermostat/$datatype", "float", 1, True)
    client.publish("homie/homie-3-temperature-sensor/temperature/thermostat/$format", "-20:120", 1, True)

    client.publish("homie/homie-3-temperature-sensor/temperature/mode", "NORMAL", 1, True)
    client.publish("homie/homie-3-temperature-sensor/temperature/mode/$name", "Thermostat Mode", 1, True)
    client.publish("homie/homie-3-temperature-sensor/temperature/mode/$settable", "true", 1, True)
    client.publish("homie/homie-3-temperature-sensor/temperature/mode/$datatype", "enum", 1, True)
    client.publish("homie/homie-3-temperature-sensor/temperature/mode/$format", "NORMAL,COLD,HEAT", 1, True)

    client.publish("homie/homie-3-temperature-sensor/temperature/button/$name", "Thermostat Button", 1, True)
    client.publish("homie/homie-3-temperature-sensor/temperature/button/$retained", "false", 1, True)
    client.publish("homie/homie-3-temperature-sensor/temperature/button/$datatype", "enum", 1, True)
    client.publish("homie/homie-3-temperature-sensor/temperature/button/$format", "SHORT_PRESSED,DOUBLE_PRESSED,LONG_PRESSED", 1, True)

def on_connect(client, userdata, flags, rc):
    print("Connected with result code {}".format(rc))
    advertize_device()
    client.subscribe("homie/homie-3-temperature-sensor/temperature/mode/set")
    client.subscribe("homie/homie-3-temperature-sensor/temperature/thermostat/set")

def on_message(client, userdata, msg):
    print("{} -> {}".format(msg.topic, msg.payload))
    if "temperature/mode/set" in msg.topic:
        print("Changing thermostat mode to: '{}'".format(msg.payload))
        client.publish("homie/homie-3-temperature-sensor/temperature/mode", msg.payload, 1, True)
    elif "temperature/thermostat/set" in msg.topic:
        print("Changing thermostat temperature to: {} ºC".format(msg.payload))
        client.publish("homie/homie-3-temperature-sensor/temperature/thermostat", msg.payload, 1, True)

def loop():
    last_temperature_time = 0
    last_button_time = 0
    last_stats_time = 0
    uptime = 0
    while True:
        if (time.time() - last_temperature_time) > 5:
            temperature_value = random.uniform(20,25)
            print("Sending tempertaure value: {0:.2f}ºC".format(temperature_value))
            client.publish("homie/homie-3-temperature-sensor/temperature/temperature", "{0:.2f}".format(temperature_value), 1, True)
            last_temperature_time = time.time()
        if (time.time() - last_button_time) > 30:
            client.publish("homie/homie-3-temperature-sensor/temperature/button", "SHORT_PRESSED")
            last_button_time = time.time()
        if (time.time() - last_stats_time) > 10:
            uptime += 10
            client.publish("homie/homie-3-temperature-sensor/$stats/uptime", uptime)
            last_stats_time = time.time()
        time.sleep(1)

client = mqtt.Client(mqtt_client_id)
client.on_connect = on_connect
client.on_message = on_message

client.will_set("homie/homie-3-temperature-sensor/$state", "lost", 1, True)

client.connect(mqtt_broker_host, mqtt_broker_port, mqtt_client_keepalive)

client.loop_start()

try:
    loop()
except KeyboardInterrupt:
    print("Stopping client")
    client.publish("homie/homie-3-temperature-sensor/$state", "disconnected", 1, True)
    client.disconnect()
    client.loop_stop()

We are very close to having a perfect implementation, thanks for your effort! :+1:

Best regards,

Aitor

davidgraeff commented 5 years ago

Not valid anymore. The $stats/interval value cannot pull a device OFFLINE anymore in the latest MQTT PR.

The spec said: Every value under $stats must be updated. Update $stats/interval itself within time, and that will work with your version of the binding.

Please check again in the next snapshot, thanks.

bodiroga commented 5 years ago

Hi @davidgraeff!

Not valid anymore. The $stats/interval value cannot pull a device OFFLINE anymore in the latest MQTT PR.

Yeah, that seems to be the case, but I don't understand how that could happen if my bundle version id is 201811100937, which means that the bundle was compiled today (2018-11-10) at 09:37. At the same time, your last PR (#6489) was merged yesterday (2018-11-09) at 21:00. It doesn't make sense to me :confused:

The spec said: Every value under $stats must be updated.

Ok, I understand what you are saying, although non of the implementation that I know send the "$stats/interval" value every "interval" interval. The "$stats/interval" topic is not listed in the device statistics section of the documentation (https://homieiot.github.io/spec-core-v3_0_1/#device-statistics), but the specification also states that "Interval in seconds at which the device refreshes its $stats/+". Anyway, this doesn't matter, because you have removed this check in the binding and the "$stats" node is in a rework process for Homie 4.0.

Please check again in the next snapshot, thanks.

I will, although I though that I had the latest version of the binding :cry:

PS: I'm going to open a new issue regarding the trigger channels in the Homie implementation (when the property states that "$retained" is false), because they don't work as expected.

davidgraeff commented 5 years ago

I looked it up the fix I mentioned is of a different nature. The default for interval was 60 and interval was a required attribute. That led to an offline think if interval was not published. That is fixed.

But yeah I assume $Stats will change so I don't want to spend much time on it.

bodiroga commented 5 years ago

I looked it up the fix I mentioned is of a different nature. The default for interval was 60 and interval was a required attribute. That led to an offline think if interval was not published. That is fixed.

I'm not sure if I understood what you mean. The "$stats/interval" topic has been a required attribute since the first versions of Homie, so I'm glad that you have fix that. But publishing in the "$stats/interval" topic periodically (with "interval frequency") is required now? homie-esp8266 and homie-python don't follow this criteria, so all Homie devices developed using this libraries will go Offline after the first minute :cry:

davidgraeff commented 5 years ago

$stats/interval has maybe been required per spec but not in this controller implementation, because I thing $stats is utterly flawed. It was a bug in my code that it was required accidentally.

And yes, if you take the wording of the spec literately then $stats/interval is required to be refreshed as well. But yeah flawed design as I said.

bodiroga commented 5 years ago

Sorry David, but I feel very stupid, because I don't understand if you have answered my question :confused: The simple question is:

With the last pull request (merged in the current ESH code base), will a Homie device go offline if it doesn't publish to "$stats/interval" periodically?

Thanks!

davidgraeff commented 5 years ago

$stats/interval not published: No affect, doesn't go offline automatically $stats/interval published, but 0: No affect, doesn't go offline automatically $stats/interval > 0: Goes offline automatically if not receiving a new published value within given seconds

bodiroga commented 5 years ago

Ok, know everything is clear, thank you! :+1:

I will try to update the homie-python and homie-esp8266 libraries to follow your interpretation of the wording of the spec, because otherwise all Homie devices will go offline after the first 'interval' period (both libraries set the stats interval to 60 by default, so they are in the third case). Let's see if the rest of the developers agree with you :crossed_fingers:

bodiroga commented 5 years ago

Hi @esdeboer!

Yeah, you are right, the binding is very restrictive and time sensitive to the heartbeat interval, it should allow a 10-15% time deviation before changing the thing status to offline.

Anyway, yesterday I pushed a new commit to the homie-python PR, because the library was not following the specification (it didn't update the 'interval' attribute every seconds). It doesn't solve the problem completely, because from time to time the thing goes to offline, but I don't want to introduce an artificial '-1 second' time to the publish pattern. Can you pull the latest version of the code and test it?

Best regards,

Aitor

esdeboer commented 5 years ago

Hi @bodiroga

Thanks for the latest version, I only saw your commit after I posted my comment. So far it works, my homie python thing is is still online after a couple of hours.

davidgraeff commented 5 years ago

If you don't publish it at all, the binding will not consider it for the online status. MQTT has a build-in heartbeat mechanism.

bodiroga commented 5 years ago

If you don't publish it at all, the binding will not consider it for the online status. MQTT has a build-in heartbeat mechanism.

Yeah, sure @davidgraeff, the thing is that, even if the publish is off for a few milliseconds, the binding sets the thing status to offline. Have a look at the next log lines:

2019-01-04 18:01:21.771 [hingStatusInfoChangedEvent] - 'mqtt:homie300:embedded-mqtt-broker:marantz-controller' changed from ONLINE to OFFLINE (GONE): Device did not send heartbeat in time
2019-01-04 18:01:21.781 [hingStatusInfoChangedEvent] - 'mqtt:homie300:embedded-mqtt-broker:marantz-controller' changed from OFFLINE (GONE): Device did not send heartbeat in time to ONLINE

10 milliseconds off, this is too aggressive, isn't it? That's why we propose the 5-10-15% deviation (for a 60 seconds interval, this should be a 3-6-9 seconds delay). Anyway, I really don't understand the motivation behind implementing this mechanism, sorry, IMHO the $state topic (with the will topic message) should be the only way to detect if the device is online or offline. The $stats interval information could be used to add more detailed information (a "ONLINE: Device not healthy" status, for example), but no more, because the device is clearly online as the "will" message has not been sent.

If you agree, I could try to implement the changes, but my free time will be very limited for the next weeks :sob:

Thanks for your help and best regards,

Aitor

davidgraeff commented 5 years ago

Because I want to make the binding Homie 4 compliant, I will remove the entire $stats party anyway