Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge 🌉, get rid of your proprietary Zigbee bridges 🔨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
12.21k stars 1.68k forks source link

Firmware 20240710 repair Devices every day #23869

Open Barneybaer84 opened 2 months ago

Barneybaer84 commented 2 months ago

What happened?

Since the firmware Update to my Sonsoff zigbee 3.0-P , I have devices that are offline every day and need to be re-paired. This is frustrating

What did you expect to happen?

No response

How to reproduce it (minimal and precise)

No response

Zigbee2MQTT version

1.40.0

Adapter firmware version

20240710

Adapter

SONOFF ZigBee 3.0 USB Dongle Plus

Setup

X86 Home Assistant

Debug log

No response

raaaf commented 2 months ago

Same here.

rursache commented 2 months ago

same issue here, i downgraded the adapter to 20221226 but it still crashed. so i also downgraded zigbee2mqtt to 1.39.1 and i plan to go as low as 1.38.0

my zigbee network was flawless for 2 years until these 2-3 days when both zigbee2mqtt and adapter fw got updated

we are not alone https://github.com/Koenkk/Z-Stack-firmware/issues/518

Koenkk commented 2 months ago

Could you provide the debug log from starting z2m until the device drops?

jymorel commented 2 months ago

Same issue here after firmware and z2m updates. Especially sonoff temp and humidity sensor (snzb-02), but not only. Re-pairing is not enough. Need to delete the device and re-pair again

Barneybaer84 commented 2 months ago

I have update the firmware at 05.09. but my next log ist at 06.09. In the log you can ignore Alarm Sirene, Alarm Sirene defekt and Nachttisch Schatz. All other devices are always online. 06.09.24_log.zip The most devices where goes offline are Aqara Sensors, like "Kinderzimmer Temperatur" or "Felix Zimmer Fenster Rechts" or "SZ Fenster Links"

rursache commented 2 months ago

@Koenkk: Could you provide the debug log from starting z2m until the device drops?

here are my logs: zigbee2mqtt_logs.txt

the issue happened at exactly 2024-09-08T13:50:08.712364154Z and i think the relevant part is this:

2024-09-08T13:50:08.712364154Z [2024-09-08 16:50:08] debug:     zh:controller:endpoint: Error: ZCL command 0x00124b00259aa8d0/8 genBasic.read(["zclVersion"], {"timeout":10000,"disableResponse":false,"disableRecovery":true,"disableDefaultResponse":true,"direction":0,"reservedBits":0,"writeUndiv":false}) failed (SRSP - AF - dataRequest after 6000ms)
2024-09-08T13:50:08.712384320Z     at Object.start (/app/node_modules/zigbee-herdsman/src/utils/waitress.ts:59:23)
2024-09-08T13:50:08.712387080Z     at /app/node_modules/zigbee-herdsman/src/adapter/z-stack/znp/znp.ts:300:45
2024-09-08T13:50:08.712389245Z     at Queue.execute (/app/node_modules/zigbee-herdsman/src/utils/queue.ts:36:26)
2024-09-08T13:50:08.712391411Z     at Znp.request (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/znp/znp.ts:291:27)
2024-09-08T13:50:08.712393557Z     at ZStackAdapter.dataRequest (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/zStackAdapter.ts:1201:24)
2024-09-08T13:50:08.712395752Z     at ZStackAdapter.sendZclFrameToEndpointInternal (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/zStackAdapter.ts:446:46)
2024-09-08T13:50:08.712398067Z     at /app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/zStackAdapter.ts:380:25
2024-09-08T13:50:08.712400231Z     at Queue.execute (/app/node_modules/zigbee-herdsman/src/utils/queue.ts:36:26)
2024-09-08T13:50:08.712402356Z     at ZStackAdapter.sendZclFrameToEndpoint (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/zStackAdapter.ts:378:27)
2024-09-08T13:50:08.712404589Z     at Request.func (/app/node_modules/zigbee-herdsman/src/controller/model/endpoint.ts:296:36)

after these, my light bulbs and smart plugs + the router all dropped, bringing down the entire zigbee network

with a container restart, the zigbee network is back online.. until the next time (between 5mins and 8h)

image

im using Slaesh Zigbee 3.0 USB Stick (i have two of them, one coordinator and one router. i switched them around to rule out a hardware issue. the problem happens on both

Koenkk commented 2 months ago

@Barneybaer84 does the issue also happen when you keep the devices really close to the coordinator? Z2M tries to reach it but fails, could indicate an issue with one of the routers

@rursache does the issue still happen with the availability feature disabled? Did the 20221226 firmware work?

rursache commented 2 months ago

@Koenkk I always had "Availability" enabled as I want to see that my devices are available. i can try with it off next week but I don't want to keep it disabled. the latest working fw was 'CC2652RB_coordinator_20221226' (almost 2 years, butter smooth and stable) but now downgrading to it does not fix it!

Barneybaer84 commented 2 months ago

After 5 days with device drops and update z2m to 1.40.1, my Zigbee network works perfecly again.

Skeletorjus commented 2 months ago

I'm facing the same, but I'm not so sure that 20240710 is causing this - I had the same with 20230507 (https://github.com/Koenkk/zigbee2mqtt/issues/23329#issuecomment-2227268818). Updated to 20240410 five days ago and am still having the same thing happening with the network going offline. The way I usually notice it is because some of my bulbs (Namrons, to be exact) starts panicking and goes into some flashing disco mode 😆

I can't remember when it started, but I'm tempted to say that it was around 1.38.0.

Log from a couple of minutes before the crash and the crash itself, pretty much the same as https://github.com/Koenkk/zigbee2mqtt/issues/23869#issuecomment-2336713323. z2m_error.txt

Zigbee2MQTT 1.40.1 commit: 403d3c0 Sonoff Plus P zStack3x0 20240710.

rursache commented 2 months ago

@Koenkk any update on this?

jymorel commented 2 months ago

everything ok for a few days after re-pairing several devices

rursache commented 2 months ago

i still have to restart the zigbee2mqtt docker container every hour otherwise the coordinator and all the routers go offline, bringing everything down. its instantly fixed after the container restarts

rursache commented 2 months ago

as a walkaround and fed up with the sloppy cronjob, i made this HomeAssistant automation to restart the zigbee2mqtt docker container when my philips hue light goes offline (unavailable in HASS):

alias: Fix Zigbee2MQTT
description: ""
trigger:
  - platform: state
    entity_id:
      - light.living_room_philips_hue_color
    from: null
    to: unavailable
    for:
      hours: 0
      minutes: 0
      seconds: 5
condition: []
action:
  - action: shell_command.restart_zigbee2mqtt
    metadata: {}
    data: {}
mode: single

please note that you need to create an entry for shell_command.restart_zigbee2mqtt in your HASS configuration.yaml file like this:

shell_command:
  restart_zigbee2mqtt: >
    'nohup curl -X POST URL $1 > /dev/null 2>&1 &' 

make sure to replace URL with your portainer or whatever else webhook you have and light.living_room_philips_hue_color with your zigbee entity from HASS

now when the light goes offline (the simpler way of detecting when the entire zigbee network is down) it will restart the z2m container bringing everything up and running in under 15 seconds

i really can't wait for a proper fix tho!

Skeletorjus commented 2 months ago

Very likely a red herring, but do any of you utilize bindings? My network has been crashing on and off for a while (as described in https://github.com/Koenkk/zigbee2mqtt/issues/23869#issuecomment-2341846601), but it hasn't been too bad lately.

I have two IKEA Parasoll-devices that both have been bound to their own bulbs (Namron 3802952). The contact sensors have been out of use for a while due to empty batteries. Today I replaced the batteries, and as soon as I did a couple of tests to ensure that the bidings to the bulbs worked, my whole network went down. Had to replug the Sonoff and restart Zigbee2MQTT. Did this multiple times.

I have removed the bindings, and for the time being the network seems stable.

rursache commented 2 months ago

being super frustrated with the lack of support or work being done to fix this, i bought a new ZigStar UZG-01 (CC2652P7) which arrived with FW 20230507. i switched the old slaesh CC2652RB with the UZG-01 and my zigbee network has been stable ever since. 72h so far, had crashes every 20min-8h. so far 0 drops or crashes. will flash the slaesh as a router and use it like that.

i think the new firmware ruins the coordinator somehow but it's just my guess. i tried flashing the slaesh CC2652RB with each firmware starting with 2022 until the latest, none fixed it. a new device did. well 🤷🏻‍♂️

@Skeletorjus i don't use bindings, don't think it's related

Koenkk commented 2 months ago

Could you see if the 99240914 firmware fixes it? If yes, then try the next fws (e.g. 99240915) until the problem reappears. fws.zip (SONOFF Dongle P only)

rursache commented 2 months ago

i do not own a sonoff device, just two slash-es.

i'm also not interested in fiddling again after i finally have a functioning network

thogens commented 2 months ago

I seem to have the same issue, but need to investigate further. So far the only way for me to fix the crash was to restart HA, but that takes 5mins at least. Next time I'll check if a simple restart of the Zigbee2Mqtt Addon will do the trick. If yes -> @rursache : I find your script promising, but I don't know how to restart the Addon in my case. I'm not running it on a dedicated portainer, but as std. HA Addon... Any idea?

Barneybaer84 commented 2 months ago

Could you see if the 99240914 firmware fixes it? If yes, then try the next fws (e.g. 99240915) until the problem reappears. fws.zip (SONOFF Dongle P only)

I will test the Firmware.

rursache commented 2 months ago

@thogens i'm sorry, i run homeassistant, zigbee2mqtt, mosquitto and all my services in docker. not sure if you could reload an HASS addon if you have HASS OS installed

Barneybaer84 commented 2 months ago

I have don't flash the test fw but i have switch the USB 3.0 to 2.0 with an USB 2.0 cable and i have no device disconnect since two days. I hope it stays that way.

Barneybaer84 commented 2 months ago

Day 4 after switch to USB 2.0, 2 devices are offline :( I will test the fws.zip

Luca1996O commented 1 month ago

Same issue here, with ZBDongle-P latest firmware version and ZigBee2MQTT latest version (1.40.1).

Barneybaer84 commented 1 month ago

With test fw 99240914 same problem. I have downgraded to 20230507, re-pair same devices and now it works for 4 days. No device goes offline anymore.

fsedarkalex commented 2 weeks ago

Are the fws.zip still a thing @Koenkk ? Currently on 20240710 and having random dropt of routers (I think multiple at a time, probably increasing drop-speed the more are dropping)

stewepylon commented 2 weeks ago

In my environment, latency was caused by the automatic update checks. Many BTicino devices are reporting hundreds of messages due to firmware version mismatches, which seems to be slowing everything down.

fsedarkalex commented 2 weeks ago

I have just disabled the OTA check to test if this fixes the device "drops"

fsedarkalex commented 2 weeks ago

Would like to add... For me currently those devices are definitely affected:

Also I am constantly losing battery powered sensors, which could be based on the same issue or probably a follow-up issue:

My network consists of total 83 devices right now...

by manufacturer

IKEA of Sweden: 41
eWeLink: 9
frient A/S: 9
OSRAM: 5
Paulmann Licht GmbH: 3
Signify Netherlands B.V.: 3
_TZ3000_okaz9tjs: 3
LUMI: 3
Paulmann LichtGmbH: 2
GLEDOPTO: 2
undefined: 1
_TZ3000_zmy1waw6: 1
_TZE204_qasjif9e: 1

by model

DS01: 7
SMSZB-120: 6
501.34: 5
Plug 01: 4
TS011F: 4
TRADFRIbulbE27WSglobeclear806lm: 4
TRADFRI bulb GU10 CWS 345lm: 4
TRADFRI Driver 30W: 4
TRADFRI bulb E27 CWS globe 806lm: 4
TRADFRI bulb E27 WW globe 806lm: 4
TRADFRIbulbE27WWclear250lm: 3
SML004: 3
HESZB-120: 3
TRADFRI control outlet: 3
RODRET Dimmer: 2
GL-SD-001: 2
lumi.sensor_wleak.aq1: 2
TRADFRIbulbE14WWclear250lm: 2
TH01: 2
TRADFRI bulb GU10 WW 345lm8: 2
TRADFRI SHORTCUT Button: 2
SYMFONISK sound remote gen2: 2
undefined: 1
lumi.vibration.aq1: 1
Remote Control N2: 1
VALLHORN Wireless Motion Sensor: 1
TRADFRIbulbE27WSglobeopal1055lm: 1
Lightify Switch Mini: 1
STARKVIND Air purifier: 1
STARKVIND Air purifier table: 1
TS0601: 1

Probably worth a mention: I am running a second Z2M instance, configured identical, also on the same coordinator FW and HW. This second network is dedicated for AwoX devices AND their related wall remotes.

I have no drops in this network at all so far. But of course it is much smaller and more homogenic:

AwoX: 10
Paulmann LichtGmbH: 3
Sunricher: 1
Paulmann Licht GmbH: 1
EGLO_ZM_TW: 9
501.34: 4
TLSR82xx: 1
ZGRC-KEY-004: 1
thogens commented 1 week ago

Yesterday again the whole network crashed and I know that I disabled the automatic OTA just some hours earlier. Currently I have some sort auf auto-detection mechanism in place that automatically reboots HA in such a case, where nothing works anymore in the Zigbee net. Downside is that this takes 15mins. But after rebooting HA, everything works fine again...