dresden-elektronik / deconz-rest-plugin

deCONZ REST-API plugin to control ZigBee devices
BSD 3-Clause "New" or "Revised" License
1.89k stars 498 forks source link

All devices not reachable after reboot / power loss #2273

Closed robertlandes closed 4 years ago

robertlandes commented 4 years ago

Environment

2020-01-02_000205-Phoscon App 2020-01-02_000206-Phoscon App 2020-01-02_000207-Phoscon App

Situation

I recently setup the complete environment from scratch to mainly do some lighting automation. so I started initially with deCONZ add-on version 5 and didn't have all the problems of migrating from older versions of the HA add-on mentioned here and in the HA forums.

After I finished everything, I took a complet backup (HASS.io snapshot) and left my parents home after christmas (which is a 600 km drive from my current location, so repairing everything would be a super bad situation for me). After a power outage, the raspberry rebooted and I ended up with all devices being unavailable. Restoring backups etc. didn't help and after investigating logs etc, I am actually out of options where to look at.

Some lights seem to be online and shown, but if you toggle them, nothing is happening, although deCONZ / HA show a change of state.

When I start the add-on with VNC enabled the weird thing is the network looks like this:

2020-01-02_000203-mama home robertlandes com (Hass io - deCONZ) – VNC Viewer

and my devices look like this:

2020-01-02_000208-Phoscon App 2020-01-02_000209-Phoscon App 2020-01-02_000210-Phoscon App

Having the network open for quite some time didn't change anything. The only thing I saw is that one motion sensor seems to connect: 2020-01-02_000211-mama home robertlandes com (Hass io - deCONZ) – VNC Viewer

SwoopX commented 4 years ago

Hey, I faced a comparable situation two days ago. Around noon, my whole network went dead (which is a bit larger). I checked the network settings in deconz GUI and what was presented to me somehow felt fishy. Due to whatever reason, I had a different PAN-ID, network key and zigbee channel. Luckily, I was able to sniff the traffic which allowed me to see my device communication (after having changed the channel back from 15 to 25, still had the network key included in wireshark). However, I had no success via deconz GUI to restore the initial settings, it simply didn't work. Then I recalled something I've read in one of the release notes which might be able to solve it, and it did the trick indeed.

Created a wiki page for it so it doesn't keep on dwelling in the depths: Network lost issues. Maybe you want to give it a shot anyway.

@manup you may want to pin the wiki article or a comparable issue. That feature was a great help to me and it might aid in resolving other people's network issues. Were quite some over the past couple of months...

robertlandes commented 4 years ago

@SwoopX Thanks for the info, but that doesn't apply in my case. As you can see in my screenshot above I only have a single network config. And as I started the complete setup with the version no. mentioned above, I think I don't suffer from any past issues from old firmware/software issues.

I read through most of the older issues here, on HA issues and forums and I think my situation is not affected by most of the problems as they are upgrade related in some way.

As I said, this is a brand new install with everything fresh from the beginning.

I also have almost the same setup running at my place (only difference is a Raspberry 4 instead of 3b+) with more than 100 devices (Ikea, Hue, Innr, Xiaomi, etc) running perfectly fine. That is why I decided to create a new issue and didn't "reuse" one of the existing ones.

SwoopX commented 4 years ago

And as I started the complete setup with the version no. mentioned above, I think I don't suffer from any past issues from old firmware/software issues.

Well, it must not necessarily be due to some legacy config stuff. I had running my config for several months and it happened out of a sudden. The last update I made was 3 months ago or so.

Point I wanted to make here is that configuration looked ok but it did not work. Applying configuration from the "secret" Phoscon page resolved it. Why not give it a try, it cannot go any more wrong :neutral_face:

robertlandes commented 4 years ago

@SwoopX I should have mentioned that before, but I already tried loading that one config I have available there without any success.

SwoopX commented 4 years ago

I see, too bad. Somehow, it feels familiar. Have you checked #2245 already?

robertlandes commented 4 years ago

@SwoopX yes, I did search and have a look at most of the related issues including this one.

I did a reboot right now and disabled autostart of the deCONZ container and then started it manually when everything was up in HA. I noticed the following on startup:

starting version 232
[21:42:04] INFO: Waiting for device...
[21:42:05] INFO: Starting VNC server...
[21:42:09] INFO: Starting the deCONZ gateway...
libEGL warning: DRI2: failed to open swrast (search paths /usr/lib/arm-linux-gnueabihf/dri:${ORIGIN}/dri:/usr/lib/dri)
libEGL warning: DRI2: failed to open swrast (search paths /usr/lib/arm-linux-gnueabihf/dri:${ORIGIN}/dri:/usr/lib/dri)
libpng warning: iCCP: known incorrect sRGB profile
[21:42:11] INFO: Starting Nginx...
[21:42:11] INFO: Running Hass.io discovery task...
[21:42:11] INFO: Running the deCONZ OTA updater...
[21:42:11] INFO: Running the IKEA OTA updater...
[21:42:11] INFO: deCONZ is set up and running!
2020/01/02 21:42:11 [notice] 390#390: using the "epoll" event method
2020/01/02 21:42:11 [notice] 390#390: nginx/1.10.3
2020/01/02 21:42:11 [notice] 390#390: OS: Linux 4.19.88-v7
2020/01/02 21:42:11 [notice] 390#390: getrlimit(RLIMIT_NOFILE): 1048576:1048576
2020/01/02 21:42:11 [notice] 390#390: start worker processes
2020/01/02 21:42:11 [notice] 390#390: start worker process 419
21:42:11:410 HTTP Server listen on address 0.0.0.0, port: 40850, root: /usr/share/deCONZ/webapp/
21:42:11:434 CTRL. 3.16.221:42:11:636 dev /dev/ttyAMA0
21:42:11:636 COM: /dev/ttyACM1 / serialno: DE2132210
21:42:11:636 COM: --dev: /dev/ttyACM1 (ConBee II)
21:42:11:636 ZCLDB init file /data/.local/share/dresden-elektronik/deCONZ/zcldb.txt
21:42:12:057 parent process bash
21:42:12:057 gw run mode: docker/hassio
21:42:12:057 GW sd-card image version file does not exist: /data/.local/share/dresden-elektronik/deCONZ/gw-version
21:42:12:058 sd-card cid: 035344535033324780ffffffff0127d1
21:42:12:098 DB sqlite version 3.16.2
21:42:12:101 DB PRAGMA page_count: 35
21:42:12:101 DB PRAGMA page_size: 4096
21:42:12:101 DB PRAGMA freelist_count: 2
21:42:12:101 DB file size 143360 bytes, free pages 2
21:42:12:102 DB PRAGMA user_version: 6
21:42:12:102 DB cleanup
21:42:12:105 DB create temporary views
21:42:12:166 don't close database yet, keep open for 900 seconds
21:42:12:167 started websocket server at port 8081
21:42:12:182 found node plugin: libde_rest_plugin.so - REST API Plugin
21:42:12:189 found node plugin: libde_signal_plugin.so - Signal Monitor Plugin
21:42:15:573 found node plugin: libstd_otau_plugin.so - STD OTAU Plugin
21:42:15:607 dev /dev/ttyAMA0
21:42:15:607 COM: /dev/ttyACM1 / serialno: DE2132210
21:42:15:607 COM: --dev: /dev/ttyACM1 (ConBee II)
PROTO: CRC error
PROTO: CRC error
21:42:15:679 dev /dev/ttyAMA0
21:42:15:679 COM: /dev/ttyACM1 / serialno: DE2132210
21:42:15:679 COM: --dev: /dev/ttyACM1 (ConBee II)
21:42:15:715 DEV config changed event
21:42:15:787 Device firmware version 0x264A0700
21:42:15:794 unlocked max nodes: 200
21:42:15:875 Device protocol version: 0x010B
21:42:15:895 new node - ext: 0x00212effff050392, nwk: 0x0000
21:42:16:006 don't close database yet, keep open for 900 seconds
21:42:16:006 LightNode 5: Wohnzimmer Tischlampe added
21:42:16:023 don't close database yet, keep open for 900 seconds
21:42:16:024 LightNode 6: Wohnzimmer Schrank added
21:42:16:047 don't close database yet, keep open for 900 seconds
21:42:16:048 LightNode 7: Schlafzimmer Decke added
21:42:16:064 SensorNode 2 set node 0xccccccfffea6f966
21:42:16:082 SensorNode 3 set node 0xccccccfffe3eeefc
21:42:16:100 SensorNode 4 set node 0xec1bbdfffe23ab3c
21:42:16:119 SensorNode 5 set node 0xccccccfffea7a8db
21:42:16:135 SensorNode 6 set node 0xccccccfffe3d9446
21:42:16:152 SensorNode 7 set node 0xccccccfffee589dc
21:42:16:169 SensorNode 8 set node 0xccccccfffee2e978
21:42:16:185 SensorNode 9 set node 0x14b457fffed4253d
21:42:16:202 SensorNode 10 set node 0xccccccfffe54452d
21:42:16:278 ZDP node descriptor request to 0x00212EFFFF050392
21:42:16:278 APS-DATA.request id: 4, addrmode: 0x02, addr: 0x0000, profile: 0x0000, cluster: 0x0002, ep: 0x00 -> 0x00 queue: 0 len: 3 tx.options 0x04
21:42:16:278 ZDP send request id: 0x03 to 0x00212effff050392
21:42:16:329 Current channel 25
21:42:16:350 CTRL ANT_CTRL 0x03
21:42:16:382 Device protocol version: 0x010B
21:42:16:440 Current channel 25
21:42:16:460 CTRL ANT_CTRL 0x03
21:42:16:490 Device protocol version: 0x010B
21:42:16:549 Current channel 25
21:42:16:569 CTRL ANT_CTRL 0x03
21:42:16:600 Device protocol version: 0x010B
21:42:16:659 Current channel 25
21:42:16:679 CTRL ANT_CTRL 0x03
21:42:16:753 APS-DATA.confirm id: 4, status: 0x00 SUCCESS
21:42:16:753 APS-DATA.confirm request id: 4 -> confirmed, timeout 38155024
21:42:16:758 APS-DATA.indication srcAddr: 0x0000, srcEp: 0x00 dstAddrMode: 2, profile: 0x0000, cluster: 0x8002, lqi: 241, rssi: 19
21:42:16:758 APS-DATA.indication request id: 4 -> finished
21:42:16:758 APS-DATA.request id: 4 erase from queue
21:42:16:758 ZDP status = 0x00 -> SUCCESS
21:42:16:758 ZDP Node_Descriptor_rsp 0x00212EFFFF050392 - 0x0000
21:42:16:837 Mgmt_Lqi_req zdpSeq: 1 to 0xEC1BBDFFFE33B91E start index 0
21:42:16:837 APS-DATA.request id: 10, addrmode: 0x03, addr: 0xec1bbdfffe33b91e, profile: 0x0000, cluster: 0x0031, ep: 0x00 -> 0x00 queue: 0 len: 2 tx.options 0x00
21:42:17:367 dev /dev/ttyAMA0
21:42:17:371 GW update firmware found: /usr/share/deCONZ/firmware/deCONZ_ConBeeII

I was wondering about the following lines in my log: PROTO: CRC error after initializing the Conbee II and searching for similar log lines I found #1996 which seems to be the exact problem I am experiencing.

easybeat commented 4 years ago

Hi since tonight I'm facing exactly the same situation. All my devices do not react. Absolut nightmare!

My whole house is not to use anymore. And I thought I'm on the right site with deconz!

Tried to restore the netword thing but didn't help.

Any inputs what I could do?

Thanks for any help! Beat

easybeat commented 4 years ago

Hi Ok after last nights total crash, I did restore my latest backup this morning. For sure still nothing working. Then I did reset every single light (have 32 of different brands) with Touchlink and run the add light process. The lights came back one after another. But unfortunately all the scenes were deleted.

If I'm not the only one having this issue then I think this is a massive problem of deconz!

I have no idea what caused that problem as the system was running with this firmware since July and my last change is 10 days ago. It started on Monday night when some lights started randomly blinking. I just managed to turn all lights off and then about 5 lights didn't show connection. So I tried to reconnect but failed and on Tuesday night after some more trying to fix it at once all my light were disconnected and gone.

@robertlandes : How did you fix your problem?

Kind regards Beat

mountainsandcode commented 4 years ago

I have exactly the same problem, it's rather annoying to have to completely reset the network every time I happen to reboot my docker host device (in my case a Synology). @manup: Any ideas? Anything we can do to facilitate the troubleshooting?

easybeat commented 4 years ago

I just had the same problem again. Some lights didn't react, some started randomly flashing. Had to disconnect each light from power and restart Deconz.

Now everything is working again.

@manup: Any ideas what could cause this. It is getting critical for me now with 2 events like that within 2 weeks!

Kind regards Beat

nodefeet commented 4 years ago

@robertlandes I am having the same issue.

After reboot there is no connection possible. And its easy to reproduce: Reset the gateway -> add lights in phoscon, create a group -> reboot --> no connection to lights or groups.

I am developing with deconz for quite a while now and it always worked fine after reboot. Right now I am only adding 2 lights from my desk but nothing works after reboot.

The only thing I changed compared to the last months is that I was using a Conbee 2 now. But it does not work with my old Conbee 1 either now.

deconz Version: 2.05.64 Conbee 1 Firmware: 26330500

Edit: I upgraded to the latest version 2.05.72 same issue I am running deconz headless on a RPi. the deconz service runs fine

jcaron23 commented 4 years ago

@nodefeet Exactly the same issue here. On reboot, all connections are gone. The settings are correct (channel, network ID, security key...), but the devices are no longer connected. It looks like the coordinator is not correctly saving details of its neighbours or something similar.

The only exception from the devices I have at hand are a Philips Hue Motion Sensor, which will manage to re-join each time.

nodefeet commented 4 years ago

I can even recreate it on my deCONZ GUI App on my windows machine.

Step 0: Reset the Gateway Step 1: Add lights Step 2: reboot

Result: no connection even after several hours no_connection

To be clear I am using these lights for almost a year without any connectivity issues.

jcaron23 commented 4 years ago

@nodefeet In my case, even after many days the connections don't come back up (except that Philips Hue Motion Sensor). Definitely something missing in what is saved and restored after a reboot.

nodefeet commented 4 years ago

@jcaron23 same for me. The connection is lost permanently.

At least in my case it has to do something with Conbee's non-volatile memory. I was able to get another Conbee 2 and it works now after reboot or shutdown, although I have no idea how I supposedly should have damaged the EEPROM of my old Conbee 1 and the new Conbee 2.

But here is what I don't understand: If network settings are lost in the Conbee after reboot. Shouldn't it be possible for me to at least reload the configuration from the hidden network site (ALT+Click on Advanced in Gateway page) for one of my "damaged" Conbees?

Screenshot 2020-02-01 at 09 46 12

After reboot the networks settings (Channel, PANID, Network Key) are the same as before. So as expected loading these settings does not reenable the connection. Backups do not help as well.

SwoopX commented 4 years ago

Applying the configuration took a while in my case. I havent touched anything for like 5 minutes. Maybe you were too impatient?

nodefeet commented 4 years ago

@SwoopX unfortunately there is no connection possible even after hours.

I was about to write how I discovered that this issue happens when you start the raspberry with deconz without an Ethernet network connection. And indeed I was able to "destroy" multiple Conbees this way. I could even reproduce it on a fresh raspbian install with deconz. But thanks to the support of dresden-elektronik the following seems to work for me!

Solution

All I did were the first three steps from the technical support page from deconz: [Point 6 under Troubleshooting for ConBee II with Linux ](https://phoscon.de/en/support#conbee2

  1. Close deCONZ if it’s running -> I did: sudo systemctl stop deconz
  2. Unplug the ConBee II and wait 10 seconds
  3. Connect the ConBee II again and wait 10 seconds -> I did: sudo systemctl start deconz afterwards

I am currently not able to reproduce the connection error after reboot and my "destroyed" Conbees seem to work again. Everything works fine :grinning:.

SwoopX commented 4 years ago

Great. Hm, Im my case, it is a Raspbee. Might be that I unplugged my Pi Zero running it for a while as part of the (at least my) solution... Anyhow, those steps should be remembered!

cooperaj commented 4 years ago

I'm seeing this issue and the above fix by @nodefeet did not work.

I have rebooted everything multiple times in different orders and at best I can get a single connection to a switch. :(

Screenshot 2020-02-10 11 37 33

Next step is reset it all and start from scratch. I had such high hopes after the initial setup being so seamless. Hopefully this can be figured out soon.

nebbiadigiorno commented 4 years ago

same problem here. At every restart all my network goes down.

nodefeet commented 4 years ago

Just to make sure:

Did you try repowering the lights?

Sometimes my lights need to be restarted as well after deconz restarts before there is any communication possible again.

RosaEinhorn commented 4 years ago

I am having the exact same issue after a restart of the host system and had to reconfigure all devices today.

jcaron23 commented 4 years ago

Is there a way to download, install and use older versions of the firmware? From what I understand this is a recent issue, so finding in which firmware version the problem was introduced should help to track it down.

manup commented 4 years ago

Hi everybody, I hope with the next release 2.05.73 the reboot issue gets addressed for some cases. But there seems to be some issue deeper.

Can you please try in deCONZ GUI:

To restart the Zigbee network, to see if that brings the network back.

cooperaj commented 4 years ago

Can you please try in deCONZ GUI:

* Leave

* Join

To clarify, should I have a broken network to start with?

manup commented 4 years ago

The steps are meant for the case when deCONZ shows "In Network" state but for some reason "all" devices are not reachable.

mycanaletto commented 4 years ago

For me I just loose the 3 last device registered. Others devices (11) are present ! Any idea ?

In fact it seems that it happens on Xiaomi sensors that were previously paired with the GW Xiaomi, then removed from it. I have just paired an Aqara sensor (opening) and I find it after restarting Deconz.

The other possibility is that in the meantime I have updated to version 2.05.74

derhappy commented 4 years ago

We are currently trying out the conbee 2 stick for our home and already ran into this issue wasting several hours. Any chance this will be fixed soon? Otherwise we'll have to return the stick as it is simply broken.

marasbird commented 4 years ago

Hello, have the same problem. Everything works fine until i reboot my Raspberry Pi. deConz is setup as service and start on boot. When i reboot my system, lost everything and have to pair again. In https://phoscon.de/en/changelog/ is in Elvis - Fix deCONZ not trying to reconnect to the ConBee, ConBee II or RaspBee in certain states after loosing the connection. But have 2.5.74 and have this issue with reboot :(

kowost commented 4 years ago

@manup : I can confirm that your suggestion worked for me. some of my Xiaomi devices needed to be triggered (e.g. open/close door) to reconnect. but afterwards all of them are back and provide data to iobroker.

I was brave enough and rebooted my hardware. Fortunately everything worked fine after the reboot too.

thx for the hint


Configuration : Ubuntu 18.04.4 on Zotac ZBOX Conbee II FW 26530700 Xiaomi : multi sensor & contact sensor Ikea : Exender 1, Remote Control, Wireless Dimmer ICZB-KPD18s IO broker

TheWizz commented 4 years ago

We've had the same problem for several months. After restarting the Deconz server, all lights fail to control. A power cycle of the lights fixes it. Note that even though Deconz can't control the lights, a Philips remote can, so there seems to be nowthing wrong with the light or the zigbee network.

StefkeJ commented 4 years ago

Same here, I purchased a Conbee II stick to start learning about home automation and I'm already unlucky. When I restart my PC usually all sensors are lost (sometimes 1 gets detected). When I pair again all is well. Conbee II Firmware 0x264a0700 DeCONZ 2.05.75 Windows 10 Xiaomi Aqara WSDCGQ01LM temp/humidity/pressure sensors version 20161129

TheWizz commented 4 years ago

Do they re-connect if you power-cycle the lights? In my case, they always seem to do that. But this kida negates the purpose of having them zigbee controllable in the first place. And it renders the entire solution very unreliable for any application except possibly the die-hard hobbyist and enthusiast.

SwoopX commented 4 years ago

@TheWizz I understand it was just about sensors.

@StefkeJ Is it? Sensors do usually not "come back" instantly, but when they have something to report. It can last up to 1h (e.g. for Xiaomi).

StefkeJ commented 4 years ago

@TheWizz , @SwoopX I just have 4 of those sensors, no lights or anything else. 3 hours computer uptime: 1 sensor detected.

nodefeet commented 4 years ago

from @TheWizz

Do they re-connect if you power-cycle the lights? In my case, they always seem to do that. But this kida negates the purpose of having them zigbee controllable in the first place. And it renders the entire solution very unreliable for any application except possibly the die-hard hobbyist and enthusiast.

Unfortunately, I have to completely agree with you on this

raelix commented 4 years ago

Hi, any news on this? Is there a workaround? This is a very critical issue

TheWizz commented 4 years ago

The lack of response to this issue (and others just like it) here is concerning. This is about basic functionality of the product, and not some peripheral aspect. Furthermore, numerous users seem to have this very problem, so its not a one-off. Since the component that's losing the connection (i.e., the CONBEE USB stick) is a product we all bought and paid for (and not part of the open source code), I'd say there's an obligation on Dresden Electronics to pay some attention here.

So where are you!?

-JM

ebaauw commented 4 years ago

https://phoscon.de/en/support

raelix commented 4 years ago

I'm in agreement with you

The lack of response to this issue (and others just like it) here is concerning. This is about basic functionality of the product, and not some peripheral aspect. Furthermore, numerous users seem to have this very problem, so its not a one-off. Since the component that's losing the connection (i.e., the CONBEE USB stick) is a product we all bought and paid for (and not part of the open source code), I'd say there's an obligation on Dresden Electronics to pay some attention here.

So where are you!?

-JM

I totally agree with you.

We have to contact support as @ebaauw suggested, at least to know if they are working on this serious issue.

From my point of view it requires an internal escalation to fix it asap.

easybeat commented 4 years ago

I did not face this problem since 2 month now => fingers crossed! But I did also not touch deconz in anyway...to scared!

So I'm going to write to Dresden Electronics Support with a link to this post. Who else is doing the same?

This needs fixing otherwise it will remain a major problem and deconz will not be a reliable product for productive home automation projects.

TheWizz commented 4 years ago

I wrote a support email but haven't heard back.

It would be interesting to hear what happens if you turn off your DECONZ (and unplug the USB stick while off, to make sure it loses power), then back on again, to see if you're seeing the same as I do. I.e., it then fails to control any lights until you power cycle the lights, and then they come right back. Scary, I know, but I guess it would help Dresden to know how to repro the problem (although one would figure they should have seen it too by now…).

-JM

SwoopX commented 4 years ago

@TheWizz it's been reported that it may work what you proposed in terms of stooping deconz and unplugging the stick. It's further up this threat.

TheWizz commented 4 years ago

Just to clarify; I'm not "proposing stopping and unplugging the stick" to fix the problem, but to reproduce it.

-JM

raelix commented 4 years ago

Unfortunately this workaround didn't work for me, after reboot I lost all connected nodes.

In my case it can simply be reproduced by turning off the host

TheWizz commented 4 years ago

after reboot I lost all connected nodes

Yes, that confirms my finding. The CONBEE network doesn't survive a DECONZ server restart which also powers off the CONBEE. Again notice what I said above that stopping the server and unplugging the stick reproduces the problem. It doesn't fix it. So this isn't a "workaround", it's a way to reproduce the problem, that I hope will help Dresden fix it, since it renders their solution pretty much useless for us.

-JM

raelix commented 4 years ago

Yes, right, do you know if is it possible to downgrade the firmware to a previous working version?

ebaauw commented 4 years ago

Just install a previous version with GCFFlasher_internal.

raelix commented 4 years ago

This could be a temporarily solution for all the users that are facing this issue, do you know which version could be fine?

Pedder007 commented 4 years ago

Hi all, I now read through the whole thread and I‘m ‚happy‘ 👎 to see, that I‘m not alone with this issue. I saw such behaviour multiple times, as the deCONZ/ConBee network seems to have forgotten multiple devices in the morning. Normally after switching the few ones, which have ‚survived‘ the night, triggered the rest to come back. That took then normally max. 5-10 minutes. But today it went really worse. Nearly no device ‚survived‘ and the network is now back after more then 4 hours, which is definitely not acceptable. After recognising that in the morning I did two reboots of the whole system which didn‘t help at all. In general I implemented a routine, longer time ago, which triggers some of the routers (Hue Plugs) at 2:00 in the night. But that seems also not being preventing such connection losses.

Frustrating is in addition, that I freshly did a firmware update (2.05.75, 264A0700) on the stick, which solved my issues with some motion/light sensors. ... no again downgrading, hmhm :-(