Koenkk / Z-Stack-firmware

Compilation instructions and hex files for Z-Stack firmwares
MIT License
2.38k stars 648 forks source link

I believe there is an issue with Z-stack 20240710 FW #518

Open habitats-tech opened 2 months ago

habitats-tech commented 2 months ago

Flashed SLZB-06P7 with 20240710 FW a couple of weeks ago. Since the new FW was applied the device randomly stops receiving updates from devices, while Zigbee2MQTT reports no issues except communication errors.

Restarting ESP32 or Zigbee (the last time I restarted Zigbee, was not enough, I had to restart ESP32 for the device to start responding; it could have been I did not give enough time for the devices to start reporting - gave it a couple of minutes which was my prior experience with the failure) the SLZB-06P7 starts processing packets again.

Time between failures, one took 2 days, and another 6 days. The last 2 weeks the device failed with the same issue 3 times.

Initially I thought it was an isolated incident, but now I am more confident is a FW issue.

Are these type of issues related to TI chipsets only?

rursache commented 1 month ago

@raveit65 mind sharing this “mod” firmware 20240909 ?

raveit65 commented 1 month ago

@rursache yes, this mod disblays with 20240909 in update page of xzg software. FW-Revision is 20240711. I found the mod firmware at this post https://github.com/Koenkk/Z-Stack-firmware/discussions/505#discussioncomment-10611025

rursache commented 1 month ago

as a walkaround and fed up with the sloppy cronjob, i made this HomeAssistant automation to restart the zigbee2mqtt docker container when my philips hue light goes offline (unavailable in HASS):

alias: Fix Zigbee2MQTT
description: ""
trigger:
  - platform: state
    entity_id:
      - light.living_room_philips_hue_color
    from: null
    to: unavailable
    for:
      hours: 0
      minutes: 0
      seconds: 5
condition: []
action:
  - action: shell_command.restart_zigbee2mqtt
    metadata: {}
    data: {}
mode: single

please note that you need to create an entry for shell_command.restart_zigbee2mqtt in your HASS configuration.yaml file like this:

shell_command:
  restart_zigbee2mqtt: >
    'nohup curl -X POST URL $1 > /dev/null 2>&1 &' 

make sure to replace URL with your portainer or whatever else webhook you have and light.living_room_philips_hue_color with your zigbee entity from HASS

now when the light goes offline (the simpler way of detecting when the entire zigbee network is down) it will restart the z2m container bringing everything up and running in under 15 seconds

i really can't wait for a proper fix tho!

nicolasvila commented 1 month ago

Just in case a few are facing the same issues as mine, maybe this could help someone. I have not a clear answer of which of the points below solved my Zigbee network and make it back to normal.

1) I reinstalled the following coordinators versions of:

This has been driving me crazy during 2 weeks (!!!!) Maybe the issue was caused by USB3 on the RPi4 but why did it work during the past 2 years? Maybe a subtle mix of different causes?

My advice:

Hope it helps.

emaayan commented 1 month ago

good thing i didn't update i was having problems updating it with add on repository

rursache commented 1 month ago

being super frustrated with the lack of support or work being done to fix this, i bought a new ZigStar UZG-01 (CC2652P7) which arrived with FW 20230507. i switched the old slaesh CC2652RB with the UZG-01 and my zigbee network has been stable ever since. 72h so far, had crashes every 20min-8h. so far 0 drops or crashes. will flash the slaesh as a router and use it like that. god knows i won't ever update the coordinator fw ever again after this.

i think the new firmware ruins the coordinator somehow but it's just my guess. i tried flashing the slaesh CC2652RB with each firmware starting with 2022 until the latest, none fixed it. a new device did. well 🤷🏻‍♂️

@nicolasvila what you described was the most basic troubleshooting one can do, the same instructions are found everywhere on the web, including in the docs. your issue is not related to the one here and does not help but dilute the actual problem

@emaayan i strongly recommend you to not update!

EDIT: one month in, still no issues with the new zigstar

emaayan commented 1 month ago

Thanks, now i have to figure out to switch the sonoff from zha to z2m cause i think the zigbee led usb light bars aren't compatible zha

habitats-tech commented 1 month ago

Following testing over the last couple of weeks I now have feedback, which I hope might be of some value to some.

The physical area where the ZB network has been deployed is as follows:

Initial deployment, experiencing lots of device communication errors and coordinator disconnections every few mins:

Re-deployment, no disconnections of any sort with coordinator or devices:

I have therefore concluded that the coordinator (UZG-01 and/or SLZB06x) when flashed with FW 20240710 runs stable when it communicates with dedicated routers (using better router-devices might have solved the issue as well, but have not tested this scenario). To stress test the system, all end-devices have been configured to report instantly on every change and they have been running faultless for a few days. No disconnections or errors of any sort.

One last observation is that Tuya devices are definitely not the right way to go, especially on larger ZB networks, for a number of reasons, the primary being stability, configurability and reliability. I am slowly replacing Tuya with Sonoff devices and so far all is good.

If I find any subsequent issues related to FW20240710 over the long run I will post on this thread again.

The only issue which remains unsolved, but not related to Z2M, is the coordinator (UZG-01) disconnects whenever I try to access its webUI; tried latest ESP32 FW 20240915 but issue persists.

devchristof commented 1 month ago

After 3weeks to try to run with 20240710 and sonoff dongle P , re flash firmware, re pair all devices, Always sames problem after a couple of hours losing devices, lag in command, not possible to pair New device. Need to unplug dongle 2 times by Day. I try also to dowgrade to fw 2023 but finaly I dowgrade to the 20221226 fw and everything working fine now ! no problem for 7days!

Zigbee2MQTT version 1.40.1 commit: unknown Coordinator type zStack3x0 Coordinator revision 20221226 Coordinator IEEE Address

Frontend version 0.7.4 zigbee-herdsman-converters version 20.12.1 zigbee-herdsman version 0.57.3 Stats Total 52 By device type Router: 28 End devices: 24 By power source Mains (single phase): 29 Battery: 21 DC Source: 2 By vendor SONOFF: 7 LUMI: 5 eWeLight: 4 GLEDOPTO: 4 _TZ3000_qeuvnohg: 4 frient A/S: 3 Niko NV: 2 _TZ3000_xr3htd96: 2 _TZE200_2aaelwxk: 2 _TZ3000_ko6v90pg: 2 zbeacon: 2 _TZ3000_cayepv1a: 2 _TZ3000_5e235jpa: 1 _TZ3000_typdpbpg: 1 ADUROLIGHT: 1 _TZE204_t1blo2bj: 1 _TZ3000_hhiodade: 1 _TZ3000_axpdxqgu: 1 _TZE200_hue3yfsn: 1 _TZE200_yvx5lh6k: 1 ptvo.info: 1 _TZE200_81isopgh: 1 _TZ3000_xwh1e22x: 1 _TZ3210_95txyzbx: 1 _TZ3210_0zabbfax: 1

SVH-Powel commented 1 month ago

After updating to 20240710, all my sensors nearby and directly connected to the coordinator started to fail. I installed the old firmware from may 2023 and have not experienced any problem the last week. So rolling back to the previous version worked fine for me.

kafisc1 commented 3 weeks ago

I'm no using launchpad_coordinator_20221226 and will report back if this solves the issue.

Great-Chart commented 1 week ago

@rursache - By reference to your "success" comment (https://github.com/Koenkk/Z-Stack-firmware/issues/518#issuecomment-2366546002) I fear I'm at that same point you reached (but likely less technically adept myself). I suspect there has been an element of user error on my side with how I've migrated from different Sonoff Dongle-P coordinators leap-frogging one FW date onto a "spare" and swapping out the "original" or flashing firmware (on Windows via python method - I cannot get into bootloader mode with the button pressed on plugging in USB etc) that has resulted in some element of corruption lingering.

Of late my network has devices going offline as quickly as circa 2 hrs and as each eureka moment of a perceived fix does little more than increase randomness I'm in need of a fresh start.

Can I ask if your network has remained stable since your replacement coordinator and what FW is that running? I've got a SLZB-06 (not M) that I plan to run via USB initially and need to further research the best approach with that and the preferred firmware

@habitats-tech - Noting you've had success with this coordinator could I ask what FM you flashed it to (noting that the SM Light Webflasher naming convention differs with v2.5.6 seemingly the most recent?

I've become somewhat uncertain where to turn and what nuggets of information to take from assorted posts as the feedback on coordinator FM flashing results are hugely varied. as such I'm likely to start a topic to gauge what is considered best practice to ensure a clean end result.

Did either of you try and retain the IEEE address of a previous coordinator or rebuild from scratch?

I'm concerned I might need to remove all devices; reset them and re-build progressively to avoid falling back into previous traps. With you both having had success (applying different methods) I thought I'd ask for additional clarity on how you went about restoring your network post plugging in "new coordinator" and thus whether I should be removing devices, powering them down, deleting coordinator_backup.json file or any other such step in any specific sequence.

Everyone appears to have had minimal issues and fathomed a permanent fix or continues to struggle (and suffer) and I'd started with issues that seemed related to one device loosing connection too regularly and ended up with a largely unusable network that I'm not likely to fix by attempting similar methods.

Log attached is the pair of logs merged from a restart this morning circa 6am that had the network down again by 8am. It doesn't seem to infer the coordinator itself has crashed; nor evident signs of interference (that I can tell) and the failure mode is nominally consistent in that devices start going offline, ping errors arise and it quietly dies. (Unless anyone can indicate otherwise for me). I've removed as many of the devices that didn't seem to self recover from a restart of Z2M or seemed otherwise excessively chatty and now a simple restart of Z2M restores things but it's hardly useable in the short term let alone the long term!

_FULL-log.log

https://github.com/Koenkk/zigbee2mqtt/issues/24387 https://github.com/Koenkk/zigbee2mqtt/issues/24401 https://github.com/Koenkk/zigbee2mqtt/issues/23329 https://github.com/Koenkk/Z-Stack-firmware/discussions/505

rursache commented 1 week ago

@Great-Chart Can I ask if your network has remained stable since your replacement coordinator and what FW is that running?

it did, perfectly stable. fw and details are in my initial comment

Did either of you try and retain the IEEE address of a previous coordinator or rebuild from scratch?

i did not but it didn't seem to matter to any accessory, everything works fine with a new IEEE address. i was just a simple swap for me, nothing else

Great-Chart commented 1 week ago

@Great-Chart Can I ask if your network has remained stable since your replacement coordinator and what FW is that running?

it did, perfectly stable. fw and details are in my initial comment

Did either of you try and retain the IEEE address of a previous coordinator or rebuild from scratch?

i did not but it didn't seem to matter to any accessory, everything works fine with a new IEEE address. i was just a simple swap for me, nothing else

Many thanks - great to hear somewhat painless for you and gives me confidence to attempt the same (perhaps playing safe and seeking out the appropriate 20230507 FW as a starting point for my SLZB-06 trial).

gcs8 commented 1 week ago

I tried the 20230507 FW and now Z2M just does not start.

[09:04:28] INFO: Preparing to start... [09:04:28] INFO: Socat not enabled [09:04:29] INFO: Starting Zigbee2MQTT... Starting Zigbee2MQTT without watchdog. [2024-10-24 09:04:31] info: z2m: Logging to console, file (filename: log.log) [2024-10-24 09:04:31] info: z2m: Starting Zigbee2MQTT version 1.40.2 (commit #unknown) [2024-10-24 09:04:31] info: z2m: Starting zigbee-herdsman (2.1.3) [2024-10-24 09:04:31] info: zh:zstack:znp: Opening TCP socket with 192.168.100.137:6638 [2024-10-24 09:04:31] info: zh:zstack:znp: Socket connected [2024-10-24 09:04:31] info: zh:zstack:znp: Socket ready [2024-10-24 09:04:31] info: zh:zstack:znp: Writing CC2530/CC2531 skip bootloader payload [2024-10-24 09:04:32] info: zh:zstack:znp: Skip bootloader for CC2652/CC1352 [2024-10-24 09:05:38] error: z2m: Error while starting zigbee-herdsman [2024-10-24 09:05:38] error: z2m: Failed to start zigbee [2024-10-24 09:05:38] error: z2m: Check https://www.zigbee2mqtt.io/guide/installation/20_zigbee2mqtt-fails-to-start.html for possible solutions [2024-10-24 09:05:38] error: z2m: Exiting... [2024-10-24 09:05:38] error: z2m: Error: network commissioning timed out - most likely network with the same panId or extendedPanId already exists nearby (Error: AREQ - ZDO - stateChangeInd after 60000ms at Object.start (/app/node_modules/zigbee-herdsman/src/utils/waitress.ts:59:23) at ZnpAdapterManager.beginCommissioning (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/manager.ts:370:31) at processTicksAndRejections (node:internal/process/task_queues:95:5) at ZnpAdapterManager.start (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/manager.ts:91:21) at ZStackAdapter.start (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/zStackAdapter.ts:158:16) at Controller.start (/app/node_modules/zigbee-herdsman/src/controller/controller.ts:137:29) at Zigbee.start (/app/lib/zigbee.ts:69:27) at Controller.start (/app/lib/controller.ts:161:27) at start (/app/index.js:154:5)) at ZnpAdapterManager.beginCommissioning (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/manager.ts:372:23) at ZnpAdapterManager.start (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/manager.ts:91:21) at ZStackAdapter.start (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/zStackAdapter.ts:158:16) at Controller.start (/app/node_modules/zigbee-herdsman/src/controller/controller.ts:137:29) at Zigbee.start (/app/lib/zigbee.ts:69:27) at Controller.start (/app/lib/controller.ts:161:27) at start (/app/index.js:154:5)

gcs8 commented 1 week ago

After doing the below, it's still having issues readding devices.


network_key: GENERATE
# Let Zigbee2MQTT generate a pan_id on first start
pan_id: GENERATE
# Let Zigbee2MQTT generate a ext_pan_id on first start
ext_pan_id: GENERATE```

```info 2024-10-24 09:18:40z2m: Zigbee: allowing new devices to join.
info 2024-10-24 09:18:40z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/bridge/response/permit_join', payload '{"data":{"time":254,"value":true},"status":"ok","transaction":"pz7yr-1"}'
info 2024-10-24 09:18:55zh:controller: Interview for '0x00158d008afe16cf' started
info 2024-10-24 09:18:55z2m: Device 'Gcs8 office temp' joined
info 2024-10-24 09:18:55z2m: Starting interview of 'Gcs8 office temp'
info 2024-10-24 09:18:55z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/bridge/event', payload '{"data":{"friendly_name":"Gcs8 office temp","ieee_address":"0x00158d008afe16cf"},"type":"device_joined"}'
info 2024-10-24 09:18:55z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/bridge/event', payload '{"data":{"friendly_name":"Gcs8 office temp","ieee_address":"0x00158d008afe16cf","status":"started"},"type":"device_interview"}'
error 2024-10-24 09:19:55zh:controller: Interview failed for '0x00158d008afe16cf with error 'Error: Interview failed because can not get node descriptor ('0x00158d008afe16cf')'
error 2024-10-24 09:19:55z2m: Failed to interview 'Gcs8 office temp', device has not successfully been paired
info 2024-10-24 09:19:55z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/bridge/event', payload '{"data":{"friendly_name":"Gcs8 office temp","ieee_address":"0x00158d008afe16cf","status":"failed"},"type":"device_interview"}'```
dpgh947 commented 1 week ago

I flashed a sonoff dongle P (CC2652P) with 20240710. It worked, and everything came up, but switching lights was randomly very laggy - particularly if I switched any one light (instant) and then another a few seconds later (took 10-15 secs), and viewing the map in Z2M took minutes compared to normally just a few seconds - I only have 16 devices. Re-flashed back to 20230507 and all is well again.