dresden-elektronik / deconz-rest-plugin

deCONZ REST-API plugin to control ZigBee devices
BSD 3-Clause "New" or "Revised" License
1.9k stars 503 forks source link

Devices no longer updating/reachable after migrating from conbee II to conbee III #7545

Closed dxa13 closed 8 months ago

dxa13 commented 9 months ago

Does the issue really belong here?

Is there already an existing issue for this?

Describe the bug

Hi, I have just migrated to Conbee III from Conbee II, running on the same raspberry PI, I followed the instructions for restoring from backup to do so. All appeared to work fine, or so I thought, until I noticed some devices had not been refreshing their status. The Phoscon web app shows all devices as present, but many showed as last refreshed a couple of days ago (i.e. from before I migrated). The deconZ gui shows all nodes being there, but many (1/4) show as not reachable. I tried:

Then I noticed the that the devices that are unreachable are those that were originally connected directly to the deconZ controller, where as those connected to my IKEA repeater and a water valve do have connectivity. So I unplugged all my repeaters, rebooted the whole network, and then slowly some (but not all) devices started connecting to the controller directly (which I thought was good). Then I plugged the repeaters again, and after everything stabilized the same 1/4 of devices were still missing. So no progress. For the sake of experimenting, I tried to delete and re-add/repair one of the sensors, which gets added in deconZ-gui as a node, and shows as reachable, but never shows up in the Phoscon web app to finish the pairing process.

Finally, given the above, I went back to the last stable deconZ version, latest Conbee III firmware, and then I decided to try restoring from backup again. Sadly now every device that managed to connect to the controller directly from the previous effort is also no longer reachable, so 3/4 of my devices are now not refreshing… except for those that remained connected to the repeaters.

Any help would be greatly appreciated. Clearly I’m making things worse the more I try to fix it. I’m happy to provide log files (see debug view text file in separate comment post below) if you can instruct me how (please note that I am a novice as I have been using Conbee II for the past 3 years without running into any issues).

Steps to reproduce the behavior

  1. Migrate to Conbee III from Conbee II by restoring from backup,
  2. Devices that were directly reachable thru the controller are no longer reachable, only those connected to the repeaters are ok. The controller only connects to the repeaters.

Expected behavior

The original nodes connected to the controller should be reachable.

Screenshots

IMG_5149

Environment

deCONZ Logs

No response

Additional context

No response

dxa13 commented 9 months ago

Adding Debug View log file missing from above post.

debug_view_21-01-2024.txt

Smanar commented 9 months ago

Honnestly, don't loose your time with various tries, there is a knowed issue on conbee 3 firmware, they are on it ATM. Just wait, if you still have your conbee 2, can use it for the moment.

dxa13 commented 9 months ago

Thank you for the response and honest opinion, I’m happy to go back to my old Conbee II stick at this point and wait. This morning I powered off the raspberry Pi, unplugged the new stick, plugged the old one, and booted up. The zigbee network shows all devices, but to my surprise only the repeaters are connecting to the controller with no child device connectivity at all (not even to the repeaters). I’ll let it sit there for the day in hopes it will heal itself. If not, I may need and would appreciate some help in getting the old stick to work again.

dxa13 commented 9 months ago

It's been >12hrs and only one child device has managed to regain connectivity. This is not looking good. I am attaching the debug view file. Any suggestions? Why would the network with the old Conbee II stick be in a similar (actually worse) situation than the conbee III stick? Does this mean my network/pairing has been permanently destroyed and I have to re-pair+recreate everything?

debug_view_22-01-2024.txt

Smanar commented 9 months ago

Where have you get thoses logs ? Or wich one flag you have enabled ? Backup need to be compatible with conbee 2 or 3. You haven't changed the usb port 2.0>3.0 ?

If you have the GUI can compare your setting with https://github.com/dresden-elektronik/deconz-rest-plugin/wiki/Network-lost-and-configuration-restore-does-not-help#in-case-the-network-does-still-not-come-up

The network key is stored on the gateway (not on the host) but is stored on phoscon backup (made with phoscon)

dxa13 commented 9 months ago

Sorry for the delayed response... To answer your questions:

debug_view_24-01-2024.txt

Screenshot 2024-01-24 at 15 23 14 Screenshot 2024-01-24 at 15 34 13
Smanar commented 9 months ago

Ok so for me your setting page is fine. The capture from deconz too, except there is no "green line", but this can be normal.

How can I check that the key is correct? The phoscon backup files (the ones I am trying to restore) are in binary.

No, it's archive ^^, you can open them whith winrar, inside you will found 2 files, 1 is a text file you can open with a text editor, it contain the network setting (from the gateway), and a sql file you can can open with an sql editor, from the host.

And yes on your logs there is realy so much SQL error. But from zigbee side all seem ok, all nodes are connected (not grayed) and all have a real name.

There is an hidden tool that can compare previous network config https://github.com/dresden-elektronik/deconz-rest-plugin/wiki/Network-lost-issues

Can you take a look in logs after somes hours instead after the loading.

And about your last error message https://forum.phoscon.de/t/lights-not-responding-sporadically-failed-status-app-busy-0x02/1882

dxa13 commented 9 months ago

Thanks for the continuing help, it's very much appreciated!

1- And thanks for letting me know it was a winrar archive... I expanded two backups, config files look the same (see below) and seems to match what deCONZ is showing in the GUI. I'll take a look later today at an even older backup from Sept. last year... deCONZ.conf.txt

2- Here is the latest log file with new errors from the time the previous was taken. You are correct in that even the Phoscon GUI shows all nodes in a non-greyed out state, they are just note being refreshed, the fly-lines are missing in the deCONZ GUI. I have of course, repeatedly, tried to wake up most sensors by pushing their little button to force to wake up, but no effect. debug_view_26-01-2024.txt

3- See screenshot below for hidden setting in the Phoscon GUI, all I see is changes to the deCONZ versions over the years, including some downgrades I have had to do when something breaks after an update, but all other network settings remain the same. Should I try loading one of the previous configs just in case?

Screenshot 2024-01-26 at 06 33 55

4- I'll look at your last link later today for that last error message.

5- I do see something curious which may or may not be related... When I load Phoscon via the web browser, where it lists the Gateway, it shows it as normal (name and icon) before I click on it to login, as well as the link icon on the upper left. That link icon disappears after a few seconds. Is that normal? Does that imply the gateway is not properly linked somehow? See two screenshots showing the link icon go away.

When loading the web page:

Screenshot 2024-01-26 at 06 20 33

After 2-3 seconds:

Screenshot 2024-01-26 at 06 20 29
dxa13 commented 9 months ago

Also, to add to the post above, here is the network status as of today. 2 more devices now show connectivity. I am not sure what that means why did it take them so long to connect (i.e. 24hrs), and why only those 2 and not the rest? Is there something wrong with the latest version of deCONZ? should I try downgrading to a previous versions and if so which would you recommend? I will mention that I just noticed that my Conbee II stick does have an early firmware, 26720700, vs. the latest 26780700 I see out there. I am afraid of what would happen if I update the firmware, but then again it can't get much worse than where I am today... should I try updating to the latest firmware despite the stick having worked fine for the past 2 years up until I went to conbee III?

Screenshot 2024-01-26 at 06 58 43
Smanar commented 9 months ago

Have checked your configuration file, it's like the one in deconz settng, all seem fine, so I don't see what the hidden tool can do more for you.

I realy don't see something critic on your logs, can you do same with flag APS, info and info_l2 ?

But if the backup was done before your migration to conbee 3, there is no reason for this one don't work with your previous configuration.

When you said device are not responding, you are looking in phoscon too ? (with the last seen)

Mimiix commented 9 months ago

Hi,

I recommend getting more routers. You seem to have atleast 39 sensors/end nodes and just 2 routers.

Typically, you want 3 end nodes per router to have a healthy network.

This probably causes the behavior, as you cant have that much devices on the coordinator. Especially when the sensors are not switching to routers, which is common with Aqara devices.

dxa13 commented 9 months ago

1- There are 2 routers in the network (1 Tradfri and the Main water valve), plus the coordinator, but if that's really not sufficient then I can add another spare Tradfri that I have, which should then result in 3 routers. I'll try that later today. But, how come the lack of enough routers hasn't been a problem for the last 2 years (I have only added one or two sensors during that time)?

2- Here is only ~1min of logging with the new debug switches enabled... I can log more time, but the log can get very big fast, let me know if you want more time, and if you want me to reboot the host and/or device before enabling the switches. debug_view_26-01-2024_2.txt

3- When I went back to my Conbee II stick, and re-plugged it, I was expecting everything would just start working again, without the need to restore from backup, but somehow it didn't (it was in fact worse than with the conbee III stick as nothing would connect other than the plugged devices/repeaters. Other than trying (1), please let me know if I should also try downgrading deCONZ to an earlier version, and which, and/or if I should risk updating the firmware on the Conbee II stick. I can also try removing all deCONZ software completely, including any remaining cache/data files, and re-installing it from scratch, if that helps, or try installing everything in a spare raspberry pi I have that I can setup as new.

4- When I look at Phoscon under devices -> sensors, after an initial reboot of the host+system, all sensors show greyed out (as expected), and after a few minutes they all turn black. But, when I click on either of them, most (except the 3 that show as reachable in the deCONZ GUI), show that they haven't refreshed in a long time (i.e. days depending on which backup I restore). Now that I am checking the status again this morning after 2 days up, I can see that most sensors (except the 3 that show as reachable in deCONZ) are now greyed out, so I guess Phoscon gave up waiting for a refresh.

Screenshot 2024-01-26 at 09 22 15 Screenshot 2024-01-26 at 09 20 03
Smanar commented 9 months ago

I realy don't see something about your sensors on logs. Have you tried to re-include one (without deleting to keep to keep the same ID)

Aqara sensors have a problem, if they loose their parent for too long time (somes hours) , they leave the network.

Mimiix commented 9 months ago

1- There are 2 routers in the network (1 Tradfri and the Main water valve), plus the coordinator, but if that's really not sufficient then I can add another spare Tradfri that I have, which should then result in 3 routers. I'll try that later today. But, how come the lack of enough routers hasn't been a problem for the last 2 years (I have only added one or two sensors during that time)?

Who knows, can be interference, can be the migration. Can be a different way on how the firmware handles the devices. Nevertheless: It isn't the way to go and how zigbee works. You really need more routers, at least 15. Additionally, You might need to re-pair some sensors as Aqara's dont tend to switch parents.

What log levels did you include?

What firmware of the Conbee 3 are you running?

3- When I went back to my Conbee II stick, and re-plugged it, I was expecting everything would just start working again, without the need to restore from backup, but somehow it didn't (it was in fact worse than with the conbee III stick as nothing would connect other than the plugged devices/repeaters. Other than trying (1), please let me know if I should also try downgrading deCONZ to an earlier version, and which, and/or if I should risk updating the firmware on the Conbee II stick. I can also try removing all deCONZ software completely, including any remaining cache/data files, and re-installing it from scratch, if that helps, or try installing everything in a spare raspberry pi I have that I can setup as new.

The reason it doesnt work, is because your stick (Conbee 3) saves the network settings including some things that are communicated to the network. If the Coordinator changes again (Conbee 2), the network needs to know and the stick needs to know the environment. Therefore, you need to recover a backup.

4- When I look at Phoscon under devices -> sensors, after an initial reboot of the host+system, all sensors show greyed out (as expected), and after a few minutes they all turn black. But, when I click on either of them, most (except the 3 that show as reachable in the deCONZ GUI), show that they haven't refreshed in a long time (i.e. days depending on which backup I restore). Now that I am checking the status again this morning after 2 days up, I can see that most sensors (except the 3 that show as reachable in deCONZ) are now greyed out, so I guess Phoscon gave up waiting for a refresh.

IIRC , Phoscon puts devices on grey after 1 day of no messages.

To be fair on this issue, it is not compliant with #5113 (and hasn't been since the start). I am willing to wait for a dev to check in and check if the above is a bug or not. If it's not, i'm going to close and have to ask you to continue on the forums.

dxa13 commented 9 months ago

Had some time to experiment yesterday.

Conbee II:

And then I decided to give Conbee III a try one more time, in the interest of science.

Conbee III:

Finally, I switched back to my Conbee II stick, restored from backup, network came-up pretty much in the same reachability state as the Conbee III network (i.e. 1/2 devices non-reachable), but then I was able to re-include the remaining sensors without issues. I'll wait a few months until bugs are flushed out, and re-evaluate at that point. But this was very painful and time consuming, particularly given that somewhere in this ordeal homebridge-deconz lost track of all my sensors, and therefore I lost all my room setup and automations, which will take me a few hours to re-create. I appreciate your help though, Smanar and Mimiix.

Smanar commented 9 months ago

something seems very broken with Conbee III's firmware/software.

Yep, but the issue is identified, IDK if it's solved, but they are on it.

I was able to re-include all sensors, except one buried inside a wall..., not as painful as I had feared

All your sensors are Aqara or Xiaomi ? They were without network during a long delay ? Unfortunately they are not the device the more faithful.

dxa13 commented 9 months ago

Yes, 43 out of my 47 nodes/devices are Aqara sensors, whether for good or bad in this case. I do realize they are very picky devices; they don't like switching routers either even if it would result in a stronger signal.

I'll give this a try again once my wounds heal, and once I figure out how to avoid whatever I did to have homebridge-deconz wipe out all sensors (I'm guessing it lost the api key).

github-actions[bot] commented 9 months ago

As there has not been any response in 21 days, this issue has been automatically marked as stale. At OP: Please either close this issue or keep it active It will be closed in 7 days if no further activity occurs.

github-actions[bot] commented 8 months ago

As there has not been any response in 28 days, this issue will be closed. @ OP: If this issue is solved post what fixed it for you. If it is not solved, request to get this opened again.