Open habitats-tech opened 2 months ago
Please post the debug log from starting z2m until this error.
See this on how to enable debug logging.
I need to do some work as they will total more than 100Mb in size. Hopefully in a few hrs from now will submit.
Unfortunately I cannot post debug info yet as I downgraded FW last night and only had the log at info level. Today I will upgrade again to 20240716 and will update once I have the first failure.
I have updated my Sonoff USB Dongle with 20240710 FW, ZigBee2MQTT and Home Assistant, need to reboot everyd ay because of loose all devices.
Version de Zigbee2MQTT 1.40.0 commit: unknown Type de coordinateur zStack3x0 Révision du coordinateur 20240710 Adresse IEEE du Coordinateur 0xxxxxxxxxxxx Version de l'interface 0.7.4 Version Zigbee-herdsman-converters 20.8.4 Version Zigbee-herdsman 0.57.1 Statistiques Total 51 Par type d'appareil Routeurs: 28 Appareils terminaux: 23 Par source d'alimentation Secteur (monophasé): 30 Batterie: 19 Source DC: 2 Par vendeur SONOFF: 7 LUMI: 5 eWeLight: 4 GLEDOPTO: 4 _TZ3000_qeuvnohg: 4 Niko NV: 2 _TZ3000_xr3htd96: 2 frient A/S: 2 _TZE200_2aaelwxk: 2 _TZ3000_ko6v90pg: 2 zbeacon: 2 _TZ3000_cayepv1a: 2 _TZ3000_5e235jpa: 1 _TZ3000_typdpbpg: 1 ADUROLIGHT: 1 _TZE204_t1blo2bj: 1 _TZ3000_hhiodade: 1 _TZ3000_axpdxqgu: 1 _TZE200_hue3yfsn: 1 _TZ3210_0zabbfax: 1 _TZE200_yvx5lh6k: 1 ptvo.info: 1 _TZE200_81isopgh: 1 _TZ3210_95txyzbx: 1 _TZ3000_xwh1e22x: 1
I change config to debug log I send beforce next reboot
i'm having issues as well, CC2652RB with 20240710 FW, with ZigBee2MQTT and HomeAssistant. woke up with half (8/19) devices offline. had to stop the ZigBee2MQTT container, unplug the zigbee adapter and then plug back in. started ZigBee2MQTT fine and devices joined back but in 8-9h it did it again.
my devices were working smooth for almost 2 years until i updated.
i flashed 20221226 back and will continue monitoring but so far so good.
EDIT: woke up with the zigbee network down again, at this point i regret updating so bad
I have done some deeper digging. For some reason the SLZB-06P7 coordinator decided not to join the network following power off and replaced it with a UZG-01 flashed with 20240710. Trying different coordinator FW versions I have observed routing is wildly different. Earlier versions to 20240710 do not cluster well around routers. 20240710 seems to elect the key routers properly, and all routers seem to connect to the coordinator as expected. However, I get constant communication errors with all routers and I lean towards the issue is device related, rather then firmware related. Possibly someone could take a deep dive into the code behind https://www.zigbee2mqtt.io/devices/QS-zigbee-S08-16A-RF.html, I have 30+ of these, plus some other Tuya relay switches. For anyone reading this DO NOT ever buy QS products as they are not reliable short or long term - they stop functioning at some point.
Images with the map and the errors are provided below. All of the routers even if they are really close (<5m no obstacles) to the coordinator seem to have low LQI (<99); if I re-pair they seem they go triple digit LQI, but eventually they settle on low double digit LQI, and therefore concluded re-pairing is not useful.
Although the battery powered devices are not connected they all seem to work flawlessly. Possibly some improvements required to get the map right. It seems to take a couple of hours for connection routing to settle.
It takes close to 20 mins for the above map to be generated.
One last thought. It would be a great addition if we have an option to clear all error messages displayed. Sometimes it takes more than 2 mins to manually clear error messages, which obscure action buttons.
Good news is the map does eventually generate an accurate connection layout. The following map is after 2 days of operation. Additionally it now takes less than 4 mins to produce the map using 20240710 UZG-01 combo.
After of almost 2 days of operation Z2M lost connection to UZG, but automatically recovered. I timed map generation to just 2.5 mins.
This zip file has all the debug logs. Logs dated 2024-09-05 were created using SLZB-06P7. Logs dated 2024-09-06 were created using UZG-01/20240710. The logs also include the instance where connection to UZG was reset by something.
I see a (0xc7: NWK_TABLE_FULL) error in the debug logs, if this is of any assistance.
20240315 allowed more devices to connect to the coordinator, 20240710 less and thus relies more on routers to improve stability of the coordinator. If those routers are crap, you will get very poor performance. I would suggest to first power of some spammy devices, e.g. '00901-33-SM'
and see if that improves your network.
20240315 allowed more devices to connect to the coordinator, 20240710 less and thus relies more on routers to improve stability of the coordinator. If those routers are crap, you will get very poor performance. I would suggest to first power of some spammy devices, e.g.
'00901-33-SM'
and see if that improves your network.
I have already done this twice, powered off the entire section of 00901, waited a few mins and power on the entire section again. I will try once more and provide feedback.
Do/Can we have any tools which allows us to influence affinity to certain routers.
For SLZB which router FW version works better with coordinator FW 20240710.
I was thinking of a tool which allows us to group routers and let the system automatically balance between them, therefore being able to avoid certain routers during pairing.
I have a question for Koenkk. The below device will re-interview successfully with no errors or warnings. How come it still shows offline with a 2 week last seen status.
I suggest an option to set the line colours in the map. The dark blue on the dark theme is difficult to visualise on a busy map.
I am wondering if it is possible to save the map in a nice format. I would be nice to track route changes or processing it in order to do better analysis.
I have 20240710 working for some weeks now. First I had some trouble with some tuya meter plug TS011F which were seen by the coordinator but not responding, or don't want to pair. I swapped some of these plug and repaired. Now everything works perfectly but one plug don't pair anymore.
Thanks for the good jobs !
Oops, I forgot to better explained what I did. Some TS011 plug did not pair anymore. I replaced one with a spare new one (never paired on my network): It worked I try to pair the other faulty one at a different location in my home : It partially worked. One were alive again, the other one refuse to pair. Maybe it is a routing problem ?
I confirm there is an issue with FW20240710. The coordinator resets sometimes after 20 mins, sometimes after hours, but it resets nevertheless. I will keep FW20240710 to assist in troubleshooting and because, since v1.40.0, Z2M automatically reconnects. My experience is that I completely lose Ethernet connectivity and I know it is not a network issue as there are hundreds of devices on the network (Ethernet and WiFi) with no issues.
Please let me know how I can assist debug this issue.
Log file attached. The crash debug is at the top of the log.
This info might be useful to some.
I have lots of Tuya QS relay devices which in theory can act as routers. When the UZG-01 was directly connected to some of them I had constant communication errors, although everything seemed to work with no issues, but with delays, sometimes considerable.
Once I paired a SLZB-06P7 router to the UZG-01 coordinator most of the communication issues between QS routers and coordinator vanished. I get the occasional error now, but the errors are not show stoppers.
The SLZB-06P7 router (00902-Router) is configured as follows:
Since I introduced the SLZB-06P7 router everything seems to have smoothed out. However, because I have issues with the UZG-01 (the coordinator) restarting at random intervals, I will upgrade to the latest SLZB-06P7 router FW and give an update if the UZG-01 issues have been eliminated or some other gremlins were introduced.
For those interested to know the SLZB-06P7 router FW differences here they are:
Zigbee2mqtt has the latest update: 1.40.1-1 Coordinator: UZG-01 20240707
Again last night total adapter crash... (Adapter Web GUI is working, and Zigbee reset does not help). After PoE reset, and I let it for 1 hour to stabilize, network was a mess.. Ikea bulbs were reporting no network route. had to reset all router devices with appartment elec. braker. Yesterday befor the crash I noticed that everything is very laggy.
And it is like this every 1-2 weeks from the day this firmvare version is released. Already rejoined all devices. As I can see SONOF dongles are the winners here, but P7 chips are simpy not working.
Are there any ongoing actions about this as reports like this were posted from first day it was released as beta?
So I updated SLZG-06P7 router FW to 20240716 from 20240315. The device goes offline. To get it back online follow this:
Will update once I am confident of any positive/negative/neutral changes to the Zigbee network.
@Koenkk my logs are posted here: https://github.com/Koenkk/zigbee2mqtt/issues/23869#issuecomment-2336713323
Running Zigbee2MQTT Edge POE UZG-01 150 Devices
Where to start? I updated a week or so ago. The update went well. I noticed Aqara devices dropped off a day later. Repaired. No issue. I then started seeing a few devices drop off. I then started to get complete network crashes. Fast forward to now and my network is struggling to stay up. I have gone back 1 FW version and the outcome is still the same. The annoying part is everything was fine until FW 20240710. This is what I'm seeing
[2024-09-09 03:42:34] error: z2m: Error while starting zigbee-herdsman
[2024-09-09 03:42:34] error: z2m: Failed to start zigbee
[2024-09-09 03:42:34] error: z2m: Check https://www.zigbee2mqtt.io/guide/installation/20_zigbee2mqtt-fails-to-start.html for possible solutions
[2024-09-09 03:42:34] error: z2m: Exiting...
[2024-09-09 03:42:34] error: z2m: Error: SRSP - ZDO - startupFromApp after 40000ms
at Object.start (/app/node_modules/zigbee-herdsman/src/utils/waitress.ts:59:23)
at /app/node_modules/zigbee-herdsman/src/adapter/z-stack/znp/znp.ts:300:45
at Queue.execute (/app/node_modules/zigbee-herdsman/src/utils/queue.ts:36:26)
at Znp.request (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/znp/znp.ts:291:27)
at ZnpAdapterManager.beginStartup (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/manager.ts:279:28)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at ZnpAdapterManager.beginRestore (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/manager.ts:330:9)
at ZnpAdapterManager.start (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/manager.ts:74:21)
at Controller.start (/app/node_modules/zigbee-herdsman/src/controller/controller.ts:138:29)
at Zigbee.start (/app/lib/zigbee.ts:65:27)
Update Im now back on latest FW and I'm seeing this in the logs
[2024-09-09 04:02:11] debug: zh:controller: Data is from unknown device with address '63441', skipping...
I tried to find any device with these numbers and I get nothing
Update Good News is clicking the pair button is not result in my coordinator crashing. This is the most stable its been in 12 hrs. I feel a bit more relaxed as its 420am and ill need to get up soon lol
I have gone back 1 FW version and the outcome is still the same. The annoying part is everything was fine until FW 20240710. This is what I'm seeing
That is the weirdest thing here.. rolling back is not fixing problems.. its like updated coordinator pushed somethint to router devices and network is not stable with older firwares any more...
having the same thoughts as @cloudbr34k84 and @dankocrnkovic. @Koenkk how is it possible that downgrading the coordinator back to the previous working version not fix the issues? i even tried downgrading the coordinator, the main router AND the zigbee2mqtt container to no avail..
I have gone back 1 FW version and the outcome is still the same. The annoying part is everything was fine until FW 20240710. This is what I'm seeing
That is the weirdest thing here.. rolling back is not fixing problems.. its like updated coordinator pushed somethint to router devices and network is not stable with older firwares any more...
i seem to have stabilised for now, but I flashed the last 3 firmware on the device retarded the device each time, then I flashed the latest and it seems to be stable again. i have 2 devices offline but that's because they are bulbs which my wife switch the lamps off manually
i seem to have stabilised for now, but I flashed the last 3 firmware on the device retarded the device each time, then I flashed the latest and it seems to be stable again. i have 2 devices offline but that's because they are bulbs which my wife switch the lamps off manually
Give it time. I thought that also, but then in 5-10 days router devices start to die again, bring the network down. After fiew iterations I decided to stay on this latest version as downgrading does not help me stabilise my home. Now I have a chair near my electricity braker box and made instructions for my partner how to recycle power in home so we can have lights....
I will never again update firmware once (and if) its stable again with some fix.
@dankocrnkovic Now I have a chair near my electricity braker box and made instructions for my partner how to recycle power in home so we can have lights....
for me a hourly crontab that restarts the zigbee2mqtt docker container is enough:
@hourly /usr/bin/docker restart zigbee2mqtt
can't wait for a proper fix tho
i seem to have stabilised for now, but I flashed the last 3 firmware on the device retarded the device each time, then I flashed the latest and it seems to be stable again. i have 2 devices offline but that's because they are bulbs which my wife switch the lamps off manually
Give it time. I thought that also, but then in 5-10 days router devices start to die again, bring the network down. After fiew iterations I decided to stay on this latest version as downgrading does not help me stabilise my home. Now I have a chair near my electricity braker box and made instructions for my partner how to recycle power in home so we can have lights....
I will never again update firmware once (and if) its stable again with some fix.
This is my first real issue since moving to Poe coordinators. I'm going to going around re pair all 150 devices throughout the week as well. I'm luck that I can still manually control 90% of lights, but still never good. Surely there as to be a better way to gather all these logs. Companies are using AI for these sorts of scenarios.
here is my map for shits and giggles
So I updated SLZG-06P7 router FW to 20240716 from 20240315. The device goes offline. To get it back online follow this:
- Forcibly remove SLZG-06P7 from Z2M
- Enable Joining through the coordinator
- Reboot SLZG-06P7
- Hopefully you should see SLZG-06P7 pairing and it should go online instantly
Will update once I am confident of any positive/negative/neutral changes to the Zigbee network.
Following SLZG-06P7 router firmware update to 20240716, the router almost constantly shows availability offline (only now and then will show availability = online). I have no idea why that is, as I have not noticed any issues on the Zigbee network, plus last seen always ages by a few minutes.
Additionally the router FW update to 20240716 did not make any difference to the issue of UZG-01 restarting at frequent, but random, intervals.
Upgraded my new SLZB-06 to the latest firmware and it won't connect to Z2M anymore either. I tried downgrading back to the stock firmware 20221226 and it's still not working for some reason.
[2024-09-09 01:31:27] info: z2m: Logging to console, file (filename: log.log)
[2024-09-09 01:31:27] info: z2m: Starting Zigbee2MQTT version 1.40.1 (commit #403d3c0)
[2024-09-09 01:31:27] info: z2m: Starting zigbee-herdsman (0.57.3)
[2024-09-09 01:31:28] info: zh:adapter: Starting mdns discovery for coordinator: slzb-06
[2024-09-09 01:31:29] info: zh:adapter: Coordinator Ip: 10.0.0.12
[2024-09-09 01:31:29] info: zh:adapter: Coordinator Port: 6638
[2024-09-09 01:31:29] info: zh:adapter: Coordinator Radio: zstack
[2024-09-09 01:31:29] info: zh:adapter: Coordinator Baud: 115200
[2024-09-09 01:31:29] info: zh:zstack:znp: Opening TCP socket with 10.0.0.12:6638
[2024-09-09 01:31:29] info: zh:zstack:znp: Socket connected
[2024-09-09 01:31:29] info: zh:zstack:znp: Socket ready
[2024-09-09 01:31:29] info: zh:zstack:znp: Writing CC2530/CC2531 skip bootloader payload
[2024-09-09 01:31:30] info: zh:zstack:znp: Skip bootloader for CC2652/CC1352
[2024-09-09 01:32:37] error: z2m: Error while starting zigbee-herdsman
[2024-09-09 01:32:37] error: z2m: Failed to start zigbee
[2024-09-09 01:32:37] error: z2m: Check https://www.zigbee2mqtt.io/guide/installation/20_zigbee2mqtt-fails-to-start.html for possible solutions
[2024-09-09 01:32:37] error: z2m: Exiting...
[2024-09-09 01:32:37] error: z2m: Error: network commissioning timed out - most likely network with the same panId or extendedPanId already exists nearby (Error: AREQ - ZDO - stateChangeInd after 60000ms
at Object.start (/app/node_modules/zigbee-herdsman/src/utils/waitress.ts:59:23)
at ZnpAdapterManager.beginCommissioning (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/manager.ts:365:31)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at ZnpAdapterManager.start (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/manager.ts:86:21)
at Controller.start (/app/node_modules/zigbee-herdsman/src/controller/controller.ts:138:29)
at Zigbee.start (/app/lib/zigbee.ts:64:27)
at Controller.start (/app/lib/controller.ts:140:27)
at start (/app/index.js:154:5))
at ZnpAdapterManager.beginCommissioning (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/manager.ts:367:23)
at ZnpAdapterManager.start (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/manager.ts:86:21)
at Controller.start (/app/node_modules/zigbee-herdsman/src/controller/controller.ts:138:29)
at Zigbee.start (/app/lib/zigbee.ts:64:27)
at Controller.start (/app/lib/controller.ts:140:27)
at start (/app/index.js:154:5)
Upgraded my new SLZB-06 to the latest firmware and it won't connect to Z2M anymore either. I tried downgrading back to the stock firmware 20221226 and it's still not working for some reason.
I have found this the hard way myself. For some reason the device does not connect to Ethernet. The workaround I have found is to reset the device (press the button and while being pressed power on the device; hold for a few secs and release the button).
From then on the device will only connect using DHCP. As soon as you attempt to define a static IP it will refuse to connect to Ethernet.
I have taken this coordinator to a new network and still behaves the same. DHCP is OK, static IP no Ethernet connection.
Easy to check device has an issue by observing the RJ45 port status lights.
[2024-09-09 01:32:37] error: z2m: Error: network commissioning timed out - most likely network with the same panId or extendedPanId already exists nearby (Error: AREQ - ZDO - stateChangeInd after 60000ms
Can you share your Z2M config file please to see what you PAN ID is
Upgraded my new SLZB-06 to the latest firmware and it won't connect to Z2M anymore either. I tried downgrading back to the stock firmware 20221226 and it's still not working for some reason.
I have found this the hard way myself. For some reason the device does not connect to Ethernet. The workaround I have found is to reset the device (press the button and while being pressed power on the device; hold for a few secs and release the button).
From then on the device will only connect using DHCP. As soon as you attempt to define a static IP it will refuse to connect to Ethernet.
I have taken this coordinator to a new network and still behaves the same. DHCP is OK, static IP no Ethernet connection.
Easy to check device has an issue by observing the RJ45 port status lights.
I can still access the SLZB-06 over ethernet, it was running fine on core v2.5.2, then I updated the zigbee coordinator firmware and it won't connect to Z2M anymore. Downgraded to the original coordinator firmware as you can see here but no dice.
[2024-09-09 01:32:37] error: z2m: Error: network commissioning timed out - most likely network with the same panId or extendedPanId already exists nearby (Error: AREQ - ZDO - stateChangeInd after 60000ms
Can you share your Z2M config file please to see what you PAN ID is
I don't have a PAN ID in my config file
permit_join: true
mqtt:
base_topic: zigbee2mqtt
server: mqtt://10.0.0.11
user: redacted
password: redacted
serial:
port: mdns://slzb-06
frontend:
port: 8090
advanced:
network_key:
- # redacted
channel: 25
devices:
'0x7ce5240000057dcb':
friendly_name: Master Bedroom
'0x8c65a3fffe6e4895':
friendly_name: Outlet
homeassistant: true
Try FW 20240315
Add these lines under advanced:, which will recreate you Zigbee netowrk.
# Let Zigbee2MQTT generate a network key on first start
network_key: GENERATE
# Let Zigbee2MQTT generate a pan_id on first start
pan_id: GENERATE
# Let Zigbee2MQTT generate a ext_pan_id on first start
ext_pan_id: GENERATE
Restart Z2M.
you will have to re-pair all devices on your network.
you will have to re-pair all devices on your network.
yeah i experienced this issue early on in my zigbee journey many years ago.
I found that after multiple power cycles of the UZG-01, Z2M eventualy connects. After restart give it 2 minutes before tryng to start Z2Mqtt.
Add these lines under advanced:, which will recreate you Zigbee netowrk.
# Let Zigbee2MQTT generate a network key on first start network_key: GENERATE # Let Zigbee2MQTT generate a pan_id on first start pan_id: GENERATE # Let Zigbee2MQTT generate a ext_pan_id on first start ext_pan_id: GENERATE
Restart Z2M.
Thank you so much for this advice. My Z2M sprung back to life and I'm busy re-pairing all my devices! 😀
IMHO, the issue with the PAN ID collision is that there are devices (routers, end-devices) still keeping the Zigbee network alive when attempting to bring the coordinator up again.
Not sure if it would be possible to override this safety check in z2m in order to recover the network w/o pairing everything from scratch.
Mynetwork with sonoff usb dongle is dead, after this firmware, reboot every days seems to work but it definitly no a solution, firmware dowgrade to 2022 ? But seems not working. what do you advice me ? I Think to change to sonoff usb dongle E with ember firmware and re pail all ?.
@Koenkk I have updated my coordinator to the newest version and its working fine, however I am now facing a problem. my Sonoff dongle P, flashed as router doesnt complete pairing:
I get a loop of error :
zh:controller: Interview failed for '0x00124b0029de0f87 with error 'Error: Interview failed because can not get active endpoints ('0x00124b0029de0f87')'
I have reflashed the router firmware to this (from the same firmware) : CC1352P2_CC2652P_launchpad_router_20221102.zip , it has been working for me before, could you please take a look into this ?
it appears online, but pairing never finishes, tried flashing a couple of times, no change.
UPDATE:
Flashed coordinator back to 20221226 and router paired without issues.
@Koenkk
On 4 occasions I have attempted to access the webUI of UZG-01 the coordinator rebooted - lost and re-established connection within a few seconds. The last time it happened UZG-01 was on 23 hrs uptime.
I suggest you take a look at possible issues when coordinator busy a connection over the network for management could cause the connection to reset?
I have not tried to setup a continuous ping to test what happens when I try to access the webUI, but this is something I intent doing and will provide feedback.
I suggest you take a look at possible issues when coordinator busy a connection over the network for management could cause the connection to reset?
This is not an issue I can fix in the Zigbee firmware. Looks like an issue in the ESP32 firmware (which is not maintained by me)
I had hard times too with 20270710 firmware and uzg-01 which gave me a lot of connection issues with router devices, like it was explained in this report. Also when going back to former stable 20240316 i had this issues. This i could fix with resetting all devices with tuya app in another zigbee network. But the real fix for me is using the mod firmware 20240909. This mod runs more than 7 days stable with my uzg-01(CC2652P7) + zdm-14.1 and i never saw this weird connection issues again.
Flashed SLZB-06P7 with 20240710 FW a couple of weeks ago. Since the new FW was applied the device randomly stops receiving updates from devices, while Zigbee2MQTT reports no issues except communication errors.
Restarting ESP32 or Zigbee (the last time I restarted Zigbee, was not enough, I had to restart ESP32 for the device to start responding; it could have been I did not give enough time for the devices to start reporting - gave it a couple of minutes which was my prior experience with the failure) the SLZB-06P7 starts processing packets again.
Time between failures, one took 2 days, and another 6 days. The last 2 weeks the device failed with the same issue 3 times.
Initially I thought it was an isolated incident, but now I am more confident is a FW issue.
Are these type of issues related to TI chipsets only?