dresden-elektronik / deconz-rest-plugin

deCONZ REST-API plugin to control ZigBee devices
BSD 3-Clause "New" or "Revised" License
1.89k stars 496 forks source link

Mesh problems with IKEA devices #195

Closed donnib closed 5 years ago

donnib commented 6 years ago

I see problems with IKEA meshing (i pressume that's the problem) together with other devices from other makers for example i see following cases quite often :

  1. I can't control my osram lightpole lights even though they are very close to my IKEA lights which they mesh thru. If i click on it then use 0 they usually come back to life.
  2. My Hue Dim switch lights up red when i click on the buttons, clicking 0 in deconz doesn't help, it usually comes back by itself after quite some time.

@manup do you have a suggestion how i could troubleshoot this further to find the root cause to this ?

staraxis commented 6 years ago

I think I have a similar problem but only with my Osram lights. They seem to drop out of the mesh and behave erratically such as turning on less than a minute after being toggled off. This has been happening since updating to version 2.04.77. I have Wemo lights controlled by deconz and they still operate as normal.

If I restart the raspberry pi running deconz with the raspbee and conduct a power cycle of the lights, they will connect to the network but then drop out and not operate properly again after a short period.

donnib commented 6 years ago

@staraxis yes there seem to be problems with mesh, it's unreliable at the moment, hopefully @manup will nail it down if he can find out what's going on. For me i can't go live until i trust they system is stable enough until i get my wife to use otherwise i'll get into trouble.

Also i have trouble in my 50 nodes setup that the RaspbeeGW sometimes hangs and i need to do a power cycle to get it running again (again @manup mentioned something with a que that fills up for come reason i don't know)

manup commented 6 years ago

The filled up queues will be improved soon, by checking no more than one request to non responding devices is on air at a time (mainly sleeping end-devices). For the IKEA and OSRAM mesh problems I need more data to analyze, so any sniffer logs from problematic networks are very welcome.

donnib commented 6 years ago

so any sniffer logs from problematic networks are very welcome.

If/when i get the sniffer running can i set the sniffer up to sniff all traffic and save it then you can filter through it yourself or ?

staraxis commented 6 years ago

@donnib No worries thanks for the response. My wife isn't too upset (yet), as it's only 2 lights that are behaving a little crazy so I can wear that for a little while. As to the fix, I'm happy to contribute just not sure what I require for sniffing zigbee packets.

ebaauw commented 6 years ago

The filled up queues will be improved soon

Yes, please. I'm getting quite desperate, see https://github.com/dresden-elektronik/deconz-rest-plugin/issues/33#issuecomment-330633123.

I powered off on of my IKEA lights, and the network seems to be stable for over 24h now. I "lost" one Hue lamp (showing red in deCONZ GUI; API returning error "resource, /lights/xx, not available", when PUTting the state). Even pressing 0 wouldn't wake it, but cycling power did the trick. I also "lost" the one remaining IKEA light. Oddly, if I send a command to the group it's in, the light reacts, and even sends a report attribute command to the gateway, but it remains red. The web socket sends a change event for the light, but not for the group. GETting the group shows state.any_on is still false; GETting the light shows state.on is true.

donnib commented 6 years ago

Yeah, i have all sort of problems :

  1. Devices are shown in deconz in yellow but i can't control them, i can see websockets events comes from "them" but they don't react, i can go into deconz and click 0 multiple times then maybe they come alive.
  2. My hue dim switch keep dropping out and comming back when it suits itself.
  3. As mentioned earlier sometimes the whole system goes to a halt aka "filled up queues" so a reboot or the reset of the raspbeegw is needed.

regarding 1. i don't know if this may be caused that many times sombody by mistakes turns off like 10 ikea lights off where maybe some other devices was meshing thru them and even if i turn them back on the system is still in a weird state or at least some of the lights are in a weird state.

ebaauw commented 6 years ago

@donnib, did you ever try and connect a large number of IKEA lights to a Hue bridge? I would sort of expect it to choke as well.

manup commented 6 years ago

Yes, please. I'm getting quite desperate, see #33 (comment).

Version 2.04.78 now has prevention to fill up queues, it basically delays APS requests to a node if there are already unconfirmed requests in the queue. This is a very simple approach which might cause unexpected delays. If it turns out to be too restrictive I can raise allowed on the fly requests per node (currently 1).

http://www.dresden-elektronik.de/rpi/deconz/beta/deconz-2.04.78-qt5.deb

donnib commented 6 years ago

@ebaauw why would it choke ? IKEA issue or ?

ebaauw commented 6 years ago

why would it choke ? IKEA issue or ?

Yes, that's what I'd expect. If not, it might be interesting to sniff how the Hue bridge deals with discovering and polling the IKEA lights.

Version 2.04.78 now has prevention to fill up queues, it basically delays APS requests to a node if there are already unconfirmed requests in the queue.

I only have two IKEA lights, and found that my network has been running stable for four days now that I've powered off one. Feeling lucky, I powered in on again for 2.04.78.

donnib commented 6 years ago

Imagine having 50 IKEA nodes on ;)

Ok so you are saying this is an issue with IKEA. I guess if sniffing is possible we could sniff the ikea gateway however i can't move to it since all nodes are paired with the raspbeegw.

I'll give 2.04.78 a try.

donnib commented 6 years ago

I am running 2.04.78 and at the moment there are no changes for the good for me.

I still experience 1 and 2 VERY extensive and very often and as i mentioned i have a feeling it's getting worse or is caused by power cycle of IKEA lights. In my head i see something like the mesh network get's ruined then deconz doesn't get them back online or IKEA does something weird idk but it's very annoying.

markbeee commented 6 years ago

Did you check whether the IKEA lights have the newest firmware installed? I had some problems with IKEA lights meshing and old firmware revisions. Anyway I have only 15 nodes of which IKEA are 8 of them.

manup commented 6 years ago

Hey donni, we really need some sniffer logs to analyze what's happening here.

donnib commented 6 years ago

@manup sure but i need to know what you want as i asked above https://github.com/dresden-elektronik/deconz-rest-plugin/issues/195#issuecomment-331092444

donnib commented 6 years ago

@markbeee yes all (most) IKEA nodes have the newest firmware. I just added 5 nodes more where these probably does not have the newest firmware.

manup commented 6 years ago

Sorry I missed that one.

If/when i get the sniffer running can i set the sniffer up to sniff all traffic and save it then you can filter through it yourself or ?

Yes the sniffer can save the logs in a file and we can import and analyze them.

@markbeee yes all (most) IKEA nodes have the newest firmware. I just added 5 nodes more where these probably does not have the newest firmware.

Would be interesting to test if the problems also happen when the network only contains newest firmware lights.

markbeee commented 6 years ago

I (luckily) have no latency problems with my 5 IKEA lights, 1 remote and 2 motion sensors - all updated to the newest firmware. But I do have latency problems with my three Philips Hue lights - but this might be due to the polling technique of them. That is a clear argument to buy only IKEA/ Osram (where I have a plug running w/o problems) in the future.

ebaauw commented 6 years ago

That is a clear argument to buy only IKEA/ Osram (where I have a plug running w/o problems) in the future.

Despite ZigBee advertising themselves as a standard, all lights are not created equal. Indeed the OSRAM and IKEA lights support attribute reporting, which is a big plus. Also, the OSRAM lights allow setting the power-on default. One of my OSRAM lights even supports measuring power consumption, but it looks like that’s been removed in the later firmware (somehow, this light won’t update, even on the OSRAM gateway). And the OSRAM, IKEA, and innr lights allow setting the on level and default on/off transition time (which are hard-coded on the Philips lights). Then again, the Philips lights support a broader range of colour temperatures, and can act as ZGP proxy (needed for the Hue tap). And they’ve announced the low latancy entertainment option. I have no experience with non-Philips colour lights, but I’m pretty sure each manufacturer supports different colour gamuts (Philips actually have three different gamuts for different types of lights). Then, there’s a difference in how many groups and scenes the light can store, whether their firmware can be updated over the air (innr cannot), and whether they publish their firmware so we can use deCONZ for the update. Last, but certainly not least, there’s the price. I’ve been thinking about setting up a buyer’s guide kind of Wiki to document these differences.

I have one OSRAM plug running without any issued as well, but the other one died on me. I read that @manup also had a plug dying on him - could be coincidence, but still makes me think...

manup commented 6 years ago

I’ve been thinking about setting up a buyer’s guide kind of Wiki to document these differences.

There is one at reddit, but it seems not to be maintained anymore, maybe this one can be picked up?

https://www.reddit.com/r/Hue/wiki/compatibility_chart

iConnectHue provides a nice overview as well:

http://iconnecthue.com/supported-devices/

I have one OSRAM plug running without any issued as well, but the other one died on me. I read that @manup also had a plug dying on him - could be coincidence, but still makes me think...

I reckon mine died because I carried it around between work and home daily :)

grover commented 6 years ago

Two things: A buyers guide would be great with the added benefit of also being able to track RaspBee compatibility.

Before I do more smart light purchases: Are there known issues with the OSRAM GU10 spots? Is this really an IKEA only issue?

donnib commented 6 years ago

@manup here are two logs,

logs.zip

log.dcf is just run without touching any lights or switches log1.dcf i turned off/on couple of times and everything seemed normal.

I don't know if you can see anything unusual from this. Now that i got the sniffer un and running expect to get more when i see issues. Of course when i had the issues i didn't have the sniffer running.

manup commented 6 years ago

@donnib thanks I'll look into them, but I'm afraid we more importantly need logs while issues happen, with the info which devices have the problems (by MAC or NWK address).

donnib commented 6 years ago

@donnib yes of course, i'll provide as soon as i see problems

manup commented 6 years ago

Cool, thanks in advance, I'm really curious what is causing the issues.

donnib commented 6 years ago

I can only say the last two days the whole network went on it's knees, the lights were reacting like maybe 30min after i asked them something or maybe never reacted.

donnib commented 6 years ago

@manup I have reverted back to 77 since 78 was causing many problems with delays i mean unbearable like you anticipated. I'll report back how it goes.

manup commented 6 years ago

Version 2.04.79 addresses issues with sleeping end devices causing queues to get stuck with lots of tasks added from the REST API plugin, also discovery of the network was optimized. My network of 85 lights gets discovered in under 3 minutes (no OSRAM and IKEA lights though).

http://www.dresden-elektronik.de/rpi/deconz/beta/deconz-2.04.79-qt5.deb

The version might help in @holli73 OSRAM network too: https://github.com/dresden-elektronik/deconz-rest-plugin/issues/208

donnib commented 6 years ago

@manup So for me here is the status. My network is VERY unstable. I am running with 2.04.79. As mentioned before i have 1 OSRAM, 3 Philips (2 hue dim and 1 motion sensor) and 50 IKEA nodes (remotes, bulbs, motion sensors).

Here is a log file made from last night till today. In this session the Osram light should have been off during the night (i have automation that calls rest api) but in the morning it was not. The Osram that should be off is NWK 0x3cb0. Another issue is that i pushed the IKEA remote NWK 0x0601 in the morning like 10 times which should have turned off the lights 0x0197 and 0x1fa4 which didn't happen. Usually problems are resolved with power cycle where some of the pushes i did before power cycle are done so the light may turn off on few times before it settles.

Right now my network runs a moment then the next moment it may not work e.g extremely slow reactions. I presume all problems at the moment are queue related. Every-time i decide that i run a sniffer when a problem occurs it works so that's why i let the sniffer run so long because it's hard to catch in the sniffer.

Sorry i can't give you more at the moment but i do hope you can see a pattern in the log file.

log2.dcf.zip

manup commented 6 years ago

The logs are very weird, your lights jump between panid (networks) very often and sometimes change their NWK address. Normally they should stay at one network with the assigned NWK address.

The log shows like 50 networks on the same channel.

I don't know why they do it but as quick guess this looks like an light firmware issue.

For example the light with mac address 0x000B57FFFE3A6BB9 slipped to 5 different networks. By looking at the PanIds you'll notice they look similar and 4 of them end with 5C, this might be a memory corruption bug in the bulb firmware.

image

Is this particular light at latest firmware version?

Lights which changed network (uncomplete list):

You may check them for firmware version.

donnib commented 6 years ago

@manup the bb9 lights is an E27 bulb opal 980lumen model running 1.2.217

manup commented 6 years ago

That's the latest version, hmm bad, right now it looks like a bug in the firmware or at least I can explain why the lights behaving like this. I have no contacts to IKEA devs but the issue should be forwarded to them.

donnib commented 6 years ago

@manup i'll try sending it to them see if they will look into it.

Is any of this related that many times somebody power cycle (power off from the wall) the lights so the jumping is caused by that ? I did power cycle when they don't work so maybe that what's causing this but i have no idea how zigbee exactly works so it's just a wild guess. The NWK address is that something the light somes with and can't change or that is only the MAC and the NWK can change.

manup commented 6 years ago

Is any of this related that many times somebody power cycle (power off from the wall) the lights so the jumping is caused by that ?

It shouldn't, the panid should stay fixed also after power cycle. The NWK address can change in rare conditions when address conflicts are detected, the address is randomly assigned by each device on initial network join and should then stay fixed also after power cycles.

It looks therefore like a bug, memory corruption due stack overflow or something like that. You can test if the power cycle is related to the problem by sniff the traffic for some hours where you don't do power cycle the lights.

donnib commented 6 years ago

@manup I have send the information to my contact at IKEA, they will have a look at it. In the meantime i tried your suggestion, all nodes up (56 of them) all yellow in deConz. I let the sniffer running for 1h without controlling the lights in any way or doing any power cycles so the log should be as clean as possible. Please have a look to see if you see same behaviour. Prior to starting the sniffer i turned lights on which were turned off from power.

log_without_power_cycle.dcf.zip

donnib commented 6 years ago

@manup did you have a chance to look at this ? I am curios whether this shows same behavior. I do realize i sent it yesterday ;)

manup commented 6 years ago

I'm also curious, hadn't have the time yet but will look into it later on. Stay tuned :)

manup commented 6 years ago

This log also shows this behavior as well, albeit there are less networks.

I have seen two routing hickups which were caused by RaspBee firmware, not sure if it is related to the greater issue but the next firmware version will prevent it. The gateway needs to send a "I'am here" broadcast periodically so devices know their routes are ok, currently this is done every 2 minutes &mash; if the IKEA lights miss that or the gateway doesn't send the broadcast the lights will try to discover a route to the gateway via expensive broadcasts. And they did ... all at once, which I describe as hickup.

The OSRAM gateway sends this broadcast every 30 seconds, which is quite fast for large networks. I'll lower it from 120 seconds to 60 in the next version. Need to check how often the IKEA gateway does it.

Other than that I just saw a few lights which use an old nwkUpdateId of 0x01 instead of 0x04 (deCONZ network settings), but I don't think it's a problem (not sure though).

donnib commented 6 years ago

@manup maybe the hickups could be related idk, the contact person i have at IKEA development told me that they always test with big network such as mine and they have not seen this behaviour.

Other than that I just saw a few lights which use an old nwkUpdateId of 0x01 instead of 0x04 (deCONZ network settings), but I don't think it's a problem (not sure though).

So do i have to do something or have i done something wrong ?

manup commented 6 years ago

@manup maybe the hickups could be related idk, the contact person i have at IKEA development told me that they always test with big network such as mine and they have not seen this behaviour.

Maybe, I've connected a IKEA light to their gateway now, seems that they also use a 2 minutes interval, But lowering it a bit should be fine for RaspBee.

I also see that the IKEA gateway doesn't query the lights beside initial query, everything works via attribute reporting like OSRAM does it. This is the main difference I see, deCONZ does various queries periodically like reading neighbor tables etc. I think this can be optimized so that after initial network setup queries will slow down and eventually stopped and only lights without reporting will be queried.

So do i have to do something or have i done something wrong ?

I think the lights will figure it out on their own over time (mostly after power cycle).

donnib commented 6 years ago

I think this can be optimized so that after initial network setup queries will slow down and eventually stopped and only lights without reporting will be queried.

Hmm either IKEA is not testing and there is indeed a bug or we use the lights in a different way with deConz i guess. I don't think my network is particular big or small, it's probably what you would expect for a normal house. I would say it's on the small side since i have more to install but before i get this stable there is no point in doing so. I guess my network would go to about 60 nodes if i install the rest of the lights i have.

ebaauw commented 6 years ago

I doubt whether it’s related to the size of the network. I only have 40 lights, 70 nodes incl. sensors and switches. When both my Trådfri bulbs are powered on, the queues eventually fill up, stopping the gateway from issuing any ZigBee commands, even in 2.04.79. When I power off the white spectrum bulb, the queues don’t fill up. It smells like superstition, but I’ve been running steady now for 4 days with the bulb powered off. Typically the hang happens anywhere between a couple of hours to two days after resetting the RaspBee.

I’m thinking maybe deCONZ should whitelist lights (manufacturers?) that support attribute reporting and those that don’t. Then, only include the lights that don’t report attribute reporting in the polling, and only setup attribute reporting for lights that support it. After restart, I see a lot of log messages that deCONZ tries to setup attribute reporting on my Hue lights.

Note that the IKEA lights store the reporting configuration in volatile memory; it needs to be setup after each power-cycle.

manup commented 6 years ago

Yes network size might not be the problem, could be anything. To proceed, various problems will are under investigation and differences to IKEA gateway in regard to polling and reporting will be minimized to exclude that as problem source.

The recently added poll manger is related to that see: c41a60ed93665762fb0f5682ace3d3fe26042791 and a20b7a0bdfcf1e44327d2c8f6495a18c35d0c646

I’m thinking maybe deCONZ should whitelist lights (manufacturers?) that support attribute reporting and those that don’t. Then, only include the lights that don’t report attribute reporting in the polling, and only setup attribute reporting for lights that support it. After restart, I see a lot of log messages that deCONZ tries to setup attribute reporting on my Hue lights.

The poll manager is an generic approach to that, it won't poll attributes for which reports are received in a timely manner. The scenes and color reporting must be improved though.

Further the following will be optimized to lower requests:

Normally polling of these should be no problem even at high frequency, but there seems to be issues in OSRAM and IKEA firmware.

Note that the IKEA lights store the reporting configuration in volatile memory; it needs to be setup after each power-cycle.

deCONZ should take care of that and reconfigure on power cycle or when reports are not received in a certain time.

ebaauw commented 6 years ago

It smells like superstition, but I’ve been running steady now for 4 days with the bulb powered off.

And that jinxed it. Had a hang last night.

manup commented 6 years ago

Just came back from IKEA and now extending my network with 4 GU10 and and 1 shiny new color light. I hope to track down IKEA related issues better.

... opening these plastic package drives me crazy.

wvuyk commented 6 years ago

It did cut my hand badly once... don’t understand why they package it this way

donnib commented 6 years ago

Just came back from IKEA and now extending my network with 4 GU10 and and 1 shiny new color light. I hope to track down IKEA related issues better.

That sounds great, hope you get more acquainted with them. What kind of shiny new color light did you buy ? I guess you mean a color spectrum light like cold to warm ? IKEA does not have any RGB color lights afaik. I went to ikea yesterday and also ended up buying two things, the wireless dimmer and a E14 bulb of the chandelier type.

... opening these plastic package drives me crazy.

Yeah i fully agree, i had to open over 50 packages and the best way i ended up doing that was with a scalpel.

manup commented 6 years ago

IKEA does not have any RGB color lights afaik.

It's new, I haven't connected it to deCONZ yet. It's paired to the remote from the kit and shows, orange, blue and white via the side buttons.

http://www.ikea.com/us/en/catalog/products/20353289/

manup commented 6 years ago

Update, the color bulb works very well, vivid colors and and smooth transitions. However it only supports CIE XY color commands, color temperature (which can be derived from xy) and hue and saturation commands are ignored.