dresden-elektronik / deconz-rest-plugin

deCONZ REST-API plugin to control ZigBee devices
BSD 3-Clause "New" or "Revised" License
1.9k stars 502 forks source link

Proposal: deprecate `reachable` attributes #2590

Closed ebaauw closed 1 month ago

ebaauw commented 4 years ago

I propose to deprecate and eventually remove the state.reachable attribute from /lights resources and the config.reachable attribute from /sensors resources. These attributes are utterly useless. They do not indicate whether a device is, in fact, reachable. They certainly don't indicate whether a device is powered. They cause a lot of confusion and fruitless discussions.

As indicated earlier, I think the REST API should expose a lastseen attribute instead, indicating the last time a message from the device was received.

Background ZigBee doesn't know the concept of connections; it only deals with messages. The deCONZ GUI is like a city map: it displays roads, not actual traffic. The GUI draws lines for neighbour table entries, as reported by the devices. Just like a city map doesn't show whether a road is actually being used, the GUI doesn't show whether there's any messages using these routes.

Technically, end devices are unreachable almost all of the time. They turn off their radio, to preserve (battery) power. They can only be reached through their parent router, who acts like a post office holding poste restante (general delivery) mail. ZigBee routers, however only keep the message addressed to the end device for like 7 seconds. Light sleepers, end-devices that wake up once every 5 seconds to check whether their parent has any messages for them, can be reached "normally", albeit with a slight delay. Deep sleepers, that only poll their parent like once an hour, can only be reached for 10 to 100 milliseconds per hour.

The REST API plugin sets the reachable attribute when it receives a message from the corresponding device. So far, so good. However, clearing the reachable attribute is far more elaborate. For devices that are polled by the gateway reachable is cleared after not receiving a response to a message for a couple of times in succession. Depending on the size of your network (devices take turns being polled), this might take a couple of seconds to minutes. Devices that support attribute reporting aren't polled, so reachable cleared after missing a couple of expected reports in a row. For deep sleepers, you'd only expect a report once per hour, so it takes several hours for reachable to be cleared.

MattL0 commented 4 years ago

Thanks !!

ebaauw commented 4 years ago

Did some testing and managed to add lastseen (with millisecond resolution, see #1687) as top-level attribute to /lights and /sensors resources, based on the lastRx() value of the RestNodeBase superclass.

That value is kept in local time, so I needed to convert it to UTC. I daren't change the lastRx() value itself to UTC - it has too many uses that I don't fully oversee.

I didn't (yet?) create a proper ResourceItem for lastseen, so it's only accessible doing a GET or the /lights or /sensors resource. Not sure if we'd want web socket notifications for this attribute - with millisecond resolution, it might change a lot. Appreciate any thoughts.

Will look at lastannounced next (which I think should be a proper ResourceItem with web socket notifications). Also want to see if the REST API plugin can re-read the Basic attributes and descriptors on announce (so they would be updated after a firmware upgrade).

SwoopX commented 4 years ago

To be honsest, I really like the idea. I never had any real use for it based on the mechanism behind it as it was not precise enough for my needs. Also, it didn't help to determine if any of the devices may have any issues (emty battery or whatever). In the end, I created custom functions in my middleware to check if certain boundaries have been exceeded to derive it a device is having issues and that partially required for timestamps to be created and tracked.

If we'd use something like last seen with the value of a timestamp, I could directly use if and spare some code elsewhere to get a clean indication if everything's in order. For my taste, we don't need miliseconds here, that's a bit over the top. Btw, it would also be more beneficial for ZGP-Devices to go for a timestamp, as they don't reqularly report afaik. So "reachable" would be even worse.

ebaauw commented 4 years ago

Thanks for you reaction, @SwoopX. Would you need push notifications when lastseen changes, or would you be happy query that value using a GET?

SwoopX commented 4 years ago

Actually, I'd prefer to have it pushed automatically upon change.

Appelg commented 4 years ago

I would not call reachable completely useless, as it still manages to figure out which IKEA lights that are powered or not.

I have not read up on zigbee & deconz, but some sort of ping or periodic data is sent, at least from/to Ikea lights.

Without it, I have no idea how to display physically powered off lights as ”unreachable” in Apple Homekit.

E.g: Reachable is still able to figure out that physically powered, but turned off lights are reachable, but physically powered off lights are not.

With that info, I can display the below in Homekit. The two top lights are electrically powered, but has been turned off for at least 12 hrs. So even tho they havent been toggeled on/off for 12+hrs, deconz still figures out that they are reachable.

For the ones below, someone has powered off the physical wallswitch. It seem to take about 15-30 mins before they go unreachable. 8AD9380B-D5A7-4A8E-8AAA-E20B79D26F74

I have been using this ”config” for 6+ months with 100% correctness.

jan666 commented 4 years ago

I would not call reachable completely useless, as it still manages to figure out which IKEA lights that are powered or not.

reachable is very inaccurate. Sometimes it takes up to hours on my ikea lights after the switch was turned off.

Without it, I have no idea how to display physically powered off lights as ”unreachable” in Apple Homekit.

why not calc yourself? e.g. lastseen is older than x minutes

Appelg commented 4 years ago

I would not call reachable completely useless, as it still manages to figure out which IKEA lights that are powered or not.

reachable is very inaccurate. Sometimes it takes up to hours on my ikea lights after the switch was turned off.

Without it, I have no idea how to display physically powered off lights as ”unreachable” in Apple Homekit.

why not calc yourself? e.g. lastseen is older than x minutes

Yeah, sure. It's an improvement, but it will still have the same underlying constraints as reachable that the OP talks about, just a bit more configurable. I.e, as long as I don't do extensive testing and figure out how often each type of device sends a message, this timeout would be just as useless. I'm still "blocked" by how often a device sends its messages, and I also need a far more advanced config since IKEA lights maybe sends every X mins, HUE lights every Y mins, and so on.

I'm not against this PR, I'm just saying that some of us have found a more or less "useful" usage of reachable.

Edit: @manup below just read my mind. It's a way too big step for every enduser to figure out the thresholds on their own.

1) Deprecate reachable 2) Add lastseen 3) Add alive or operational as @manup described, and keep a well known (configurable/overridable) model for it.

manup commented 4 years ago

IMHO state.reachable is a misleading word for what it represents, perhaps alive or operational is a better fit. I don't think the attribute is useless and would suggest it becomes a defined and documented behavior. While lastseen is useful it leaves the burden of deciding that a device is operational to the REST-API user which has many device specific implications.

why not calc yourself? e.g. lastseen is older than x minutes

I'd suggest we model the value of the attribute based on the knowledge that we have about devices:

ebaauw commented 4 years ago

I'm still "blocked" by how often a device sends its messages, and I also need a far more advanced config since IKEA lights maybe sends every X mins, HUE lights every Y mins, and so on.

As I tried to explain above, that depends on many factors. For devices (like IKEA lights) that support attribute reporting, it depends on the reporting configuration. This should be configured by the REST API plugin, but might be changed by the user in the GUI. For devices that are only updated from polling, it depends on the size of your network. I my experience polling works rather unpredictably; some devices seem to be polled more frequently than others and I haven't been able to understand why.

I'd suggest we model the value of the attribute based on the knowledge that we have about devices:

In my experience, that doesn't work as there's dependencies on local settings and on the local network.

While lastseen is useful it leaves the burden of deciding that a device is operational to the REST-API user which has many device specific implications.

What about a config attribute to specify the "ping" interval? The REST API could populate it from the device knowledge, but users could change it to reflect their local situation. reachable would then be cleared if now - lastseen > interval and set when lastseen is updated.

Appelg commented 4 years ago

What about a config attribute to specify the "ping" interval? The REST API could populate it from the device knowledge, but users could change it to reflect their local situation. reachable would then be cleared if now - lastseen > interval and set when lastseen is updated.

I think that sounds like a good idea. I can help to test this feature since I already today rely heavily on reachable for my Homekit setup.

SwoopX commented 4 years ago

Wouldn't lastRx() address what @Appelg is referring to? As I only have 2 lights, I cannot judge because I simply do not care in that sense. However, I'd expect them to send something (regularly) even when powered off? So "lastseen" would be more beneficial than "reachable" IMHO.

While lastseen is useful it leaves the burden of deciding that a device is operational to the REST-API user which has many device specific implications.

@manup I believe it would pay off, also giving more flexibility. I personally love to stay in control.

Anyway, I'd also be happy with the ability to define some device specific timeouts as well. Downside I see here is that I wouldn't introduce this in the code keeping API v2 in mind. I'd rather vouche for introducing device specific config files then or extending the database model (which also could use some overhauling ;) ).

Lastly, also happy to contribute in testing and trying out various approaches in my test network.

ebaauw commented 4 years ago

Wouldn't lastRx() address what @Appelg is referring to?

In my current testing, I based lastseen on lastRx(), indeed. Limitation is that it doesn't use the Event mechanism, so no web socket notifications. I'm trying to find a better solution than added a seconds line of code everywhere rx() is called.

However, I'd expect them to send something (regularly) even when powered off?

How would they do that, when there's no power to drive the radio? Of course, when turned off but still powered, the lights continue to send reports and respond to requests from the gateway.

SwoopX commented 4 years ago

I meant indeed turned off, not powered off.

ebaauw commented 4 years ago

And this is why we did it!

$ ph get /rules/3
{
  "actions": [
    {
      "address": "/lights/10/state",
      "body": {
        "on": false
      },
      "method": "PUT"
    }
  ],
  "conditions": [
    {
      "address": "/lights/10/attr/lastannounced",
      "operator": "dx"
    }
  ],
  "created": "2020-06-12T12:42:28",
  "etag": "4e434a61d34d66bef94274cea964ee6d",
  "lasttriggered": "2020-06-12T12:43:22",
  "name": "Test",
  "owner": "01D323F3DF",
  "periodic": 0,
  "status": "enabled",
  "timestriggered": 2
}

Not too happy with the need to include attr in the condition's address, but the IKEA light (that doesn't support PoweronOnOff) now switches off shortly after powering on.

$ ph get /lights/10
{
  "etag": "aa4a927197be187e88da74ba7464342a",
  "hascolor": true,
  "lastannounced": "2020-06-12T12:43:22Z",
  "lastseen": "2020-06-12T12:46:15Z",
  "manufacturername": "IKEA of Sweden",
  "modelid": "TRADFRI bulb E27 CWS opal 600lm",
  "name": "Trådfri RGB",
  "state": {
    "alert": "none",
    "bri": 1,
    "colormode": "xy",
    "effect": "none",
    "hue": 0,
    "on": false,
    "reachable": true,
    "sat": 0,
    "xy": [
      0.5053,
      0.4153
    ]
  },
  "swversion": "1.3.009",
  "type": "Color light",
  "uniqueid": "00:0b:57:ff:fe:9a:46:ab-01"
}
ebaauw commented 4 years ago

Only update lastseen when old value is at least a minute old. This should fix the web socket notification storm. Also changed the resolution from seconds to minutes (yyyy-MM-ddThh:mmZ).

morfei1 commented 4 years ago

reachable (true) is very important for me, my IKEA lights and all my light automations to trigger a light or lights at specific CT and brightness when reachable is true.

For example: if a light/s are reachable (true) and the time is between 6:30 - 22:00 it triggers to switch them to CT 250, brightness: 212. If the same light/s are "reachable" (true) between 22:30 - 6:30 it triggers them to switch to CT 370, brightness: 127. The automations triggers only on state change. So, if the automation was triggered ones the state should become false, to be able to re-trigger again, when become true.

My main lights are controled most of the time from the old wall switchs and the reachable (true) attributes is very reliable to switch the lights at specific color and brightness.

Reachable (false) is not so relaiable but I actually dont use it for anything but just to to orientate myself, when the lights were powerd off +-15/30mins. As was mentioned earlier my Ikea lights needs about 15-30mins to report them as reachable (false), when they are powered off. Comparing them to my Aqara CT lights which only needs about 5-6min. to report them as reachable (false).

The real problem right now is that if a light is reachable (false), its "on" attributes stays (true), it does not update to false. if the state is not changing its a problem for automations.

in the example below you can see what I mean. The light is actually powerd off from the wall switch, and since it was preaviously reachable (true), also on (true). it updates reachable to (false) but on stays always (true), if I power off from the wall switch.

 "9": {
    "ctmax": 65535,
    "ctmin": 0,
    "etag": "452509519286e62cc3961",
    "hascolor": true,
    "lastannounced": "2020-08-15T09:59:47Z",
    "lastseen": "2020-08-15T16:59:47Z",
    "manufacturername": "IKEA of Sweden",
    "modelid": "TRADFRI bulb E27 WS opal 980lm",
    "name": "BR Bed Light",
    "state": {
      "alert": "none",
      "bri": 127,
      "colormode": "ct",
      "ct": 370,
      "on": true,
      "reachable": false 

For a light that is always reachable (true) (for example my night lights), on true/false is important because I control the light base on that attributs, reachable is useless at that case.

IMHO, to consider reachable is useless or not is very dependent on the way a light is used.

hbkoenig commented 4 years ago

Hi Erik,

in commits 61bfd6df 2d71369c 131815da you've changed the time stamp formats for 'lastseen' and refer to this issue, and since it's still open this is why I'm adding my comments purely about time stamp formats here. feel free to open a new issue and redirect...

I got hit by your changes when updating to deconz 2.05.80 and 2.05.81 as my own code using REST didn't accept the new format anymore, using format "%Y-%m-%dT%H:%M:%S.%N" (with and without trailing ".N") with strftime(3) and date(1), without any hints in release notes or similar that the format in REST interface changed (likely I'd not have read it anyway before automatic update;)

if I'm interested in less resolution, I always can reduce at the client level, no need to not provide exact times via REST (and I understand very well how "exact" those values are;)

now looking at the current code, there still 4 different formats for time stamps, two with "Z" and two without:

deconz-rest-plugin$ grep -r 'yyyy-MM-ddTHH' . | tr '"' '\n' | grep  yyyy-MM-ddTHH | sort  | uniq -c
     36 yyyy-MM-ddTHH:mm:ss
      8 yyyy-MM-ddTHH:mm:ssZ
      3 yyyy-MM-ddTHH:mm:ss.zzz
      3 yyyy-MM-ddTHH:mmZ

for now I very much prefer always having "ss.zzz" with fractions of seconds to use some heuristics to determine if 3 measurements for Xiaomi Aqara temperature/humidity/pressure multi_sensor node are from one measurements (e.g. within 0.5 secs, this is getting worse to useless when having resolution of seconds or even minutes).

so for testing I've changed all 43 time stamps except those 7 labeled ISO 8601 formats to "yyyy-MM-ddTHH:mm:ss.zzz" for my setup (mostly only sensor monitoring: 30+ routers, 50+ sensors, some extra stuff, and many more sensor nodes to come). let me see how this works...

please always consider if such REST interface changes should be upward compatible and transparent, to avoid surprises in unknown use cases!

comments very welcome! Harald

ebaauw commented 4 years ago

without any hints in release notes or similar that the format in REST interface changed

What part of “Changed API” don’t you understand? https://github.com/dresden-elektronik/deconz-rest-plugin/releases/tag/V2_05_80:

Changed API: Report lastseen with a resolution of minutes, and issue at most one web socket notification per minute per device for lastseen, see #2590.

hbkoenig commented 4 years ago

Hi Erik,

On 9/29/20 6:19 PM, Erik Baauw wrote:

without any hints in release notes or similar that the format in
REST interface changed

What part of “Changed API” don’t you understand? https://github.com/dresden-elektronik/deconz-rest-plugin/releases/tag/V2_05_80:

*Changed API*: Report |lastseen| with a resolution of minutes, and
issue at most one web socket notification per minute per device
for |lastseen|, see #2590
<https://github.com/dresden-elektronik/deconz-rest-plugin/issues/2590>.

oh sorry!

this rant wasn't about your releases of deconz-rest-plugin, but about the "deconz" package from dresden-elektronik. right now I'm only a user of the deconz raspian package,  and after upgrade I got hit by this incompatible change. and looking through their web pages (which I did not before upgrading, shame on me too;) I still can't find hints about any changes between deconz 2.05.79 and 2.05.80/81) -- and their changelog stopped at 2.05.77:

    https://phoscon.de/en/conbee2/software#deconz     https://dresden-elektronik.github.io/deconz-rest-doc/     https://phoscon.de/en/changelog

but maybe, breaking the API shouldn't be kept "Under the hood" ? ;-))

anyway, thanks a lot for your all your work!!

I still wonder if a single (and stable;) format for all time stamps wouldn't be better... ?!

Mit freundlichen Grüßen / Best regards Harald König Engineering Optical Systems (BST/EOS2) Bosch Sensortec GmbH | Gerhard-Kindler-Straße 9 | 72770 Reutlingen | GERMANY | www.bosch-sensortec.com http://www.bosch-sensortec.com Tel. +49 7121 35-38606 | Fax +49 711 8115140583 | Harald.Koenig2@bosch-sensortec.com mailto:Harald.Koenig2@bosch-sensortec.com Sitz: Kusterdingen, Registergericht: Stuttgart HRB 382674, Ust.IdNr. DE 183276693 - Steuer-Nr. 99012/08040 Geschäftsführung: Stefan Finkbeiner, Jens-Knut Fabrowsky

ebaauw commented 4 years ago

I still wonder if a single (and stable;) format for all time stamps wouldn't be better... ?!

The format follows ISO8601, which defines the milliseconds, seconds, and even minutes as optional, according to https://en.wikipedia.org/wiki/ISO_8601.

Except for localtime, all timestamps exposed by the API are in UTC, and should end in Z. I added that when introducing lastseen and lastannounced, but didn’t update existing UTC timestamps as not to introduce breaking changes. lastseen was still under development, so cutting the seconds part from that seemed less of a deal.

Most timestamps have a resolution of one second. We changed lastupdated to milliseconds, and lastseen to minutes at requests by API users. lastupdated for handling multiple button events in the same second. lastseen because we don’t want to issue too many notifications.

popy2k14 commented 4 years ago

@ebaauw is reachable now deprecated with v80 or newer? I dont see the new "lastseen" reading in my fhem installation.

I have used the reachable in the past for my fire/smoke sensors and dont want to loose this functionality.

ebaauw commented 4 years ago

is reachable now deprecated with v80 or newer?

It's still there. You need to ask @manup for the official status.

I dont see the new "lastseen" reading in my fhem installation.

I don't know fhem. lastseen should be in each /lights or /sensor resource corresponding to a Zigbee device.

SwoopX commented 4 years ago

FHEM simply doesn't expose lastseen as well as some other useful stuff as it is not part of the corresponding module. Was in touch with the colleague a few months back taking care of it.

I have to see that I follow up on this...

popy2k14 commented 4 years ago

Thx, please keep us updated

popy2k14 commented 4 years ago

Tomorrows fhem update should add the lastseen as stated here: https://forum.fhem.de/index.php/topic,95288.msg1090054.html#msg1090054

thx