Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge πŸŒ‰, get rid of your proprietary Zigbee bridges πŸ”¨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
12.08k stars 1.68k forks source link

1.35.0 Zigbee2MQTT stops responding after update #20519

Closed larry404 closed 3 months ago

larry404 commented 10 months ago

What happened?

On New Years Eve with 30 people in my home all my zigbee devices stopped responding. Not the best time to put out an upgrade. When I attempt to restart the add-on it will work for about 10 minutes or so and then it stops. I have never had any major issues with this addon in the past.

What did you expect to happen?

Not push out upgrades during major holidays. Have some sort of beta program to make sure the upgrade will work?

How to reproduce it (minimal and precise)

Restart the add-on and about 10 minutes later it stops responding

Zigbee2MQTT version

1.35.0

Adapter firmware version

"meta":{"maintrel":1,"majorrel":2,"minorrel":7,"product":1,"revision":20210708,"transportrev":2

Adapter

Sonoff Zigbee USB 3.0 dongle

Debug log

log.txt

julianfs commented 10 months ago

Are you using Home Assistant? If so you can turn off auto-update.

larry404 commented 10 months ago

Yes, I am using Home Assistant. In retro-spec that would have been a good idea. However, since I never had issues before I never gave that a second thought. Thanks.

BandBxx commented 10 months ago

+1, same problem. After update to v. 1.35 my z2m going to freeze after couple minutes of work. Very bad unstable update!

Hadatko commented 10 months ago

+1 and in HA community i see more users with same issue

Koenkk commented 10 months ago

Pushed a fix, please try with the latest-dev.

Changes will be available in the dev branch in a few hours from now.

If not, provide the debug log again.

See this on how to enable debug logging.

sdotter commented 10 months ago

Dont know if my issue is exact same... but i think so....

Begin of log:

debug 2024-01-03 11:32:21: Loaded state from file /config/zigbee2mqtt/state.json info 2024-01-03 11:32:21: Logging to console and directory: '/config/zigbee2mqtt/log/2024-01-03.11-32-21' filename: log.txt debug 2024-01-03 11:32:21: Removing old log directory '/config/zigbee2mqtt/log/2024-01-02.22-28-36' info 2024-01-03 11:32:21: Starting Zigbee2MQTT version 1.35.0-dev (commit #e9aee4c) info 2024-01-03 11:32:21: Starting zigbee-herdsman (0.30.0) debug 2024-01-03 11:32:21: Using zigbee-herdsman with settings: '{"adapter":{"concurrent":null,"delay":null,"disableLED":false},"backupPath":"/config/zigbee2mqtt/coordinator_backup.json","databaseBackupPath":"/config/zigbee2mqtt/database.db.backup","databasePath":"/config/zigbee2mqtt/database.db","network":{"channelList":[15],"extendedPanID":[200,245,24,196,175,214,4,79],"networkKey":"HIDDEN","panID":65534},"serialPort":{"adapter":"ezsp","path":"/dev/ttyACM0"}}' info 2024-01-03 11:32:25: zigbee-herdsman started (resumed) info 2024-01-03 11:32:25: Coordinator firmware version: '{"meta":{"maintrel":"2 ","majorrel":"7","minorrel":"2","product":11,"revision":"7.2.2.0 build 190"},"type":"EZSP v11"}' debug 2024-01-03 11:32:25: Zigbee network parameters: {"channel":15,"extendedPanID":200,"panID":65534} info 2024-01-03 11:32:25: Currently 13 devices are joined:

...

Zigbee2MQTT:debug 2024-01-03 11:35:28: Received Zigbee message from 'Color Spot (1)', type 'read', cluster 'genTime', data '["time","timeStatus","timeZone","lastSetTime","validUntilTime"]' from endpoint 1 with groupID 0 Zigbee2MQTT:debug 2024-01-03 11:35:38: Received Zigbee message from 'Color Spot (1)', type 'read', cluster 'genTime', data '["time","timeStatus","timeZone","lastSetTime","validUntilTime"]' from endpoint 1 with groupID 0 Zigbee2MQTT:debug 2024-01-03 11:35:43: Received MQTT message on 'zigbee2mqtt/bridge/request/networkmap' with data '{"routes":false,"transaction":"mia2n-1","type":"raw"}' Zigbee2MQTT:info 2024-01-03 11:35:43: Starting network scan (includeRoutes 'false') Zigbee2MQTT:debug 2024-01-03 11:35:45: Received Zigbee message from 'Climate Sensor (2)', type 'commandQueryNextImageRequest', cluster 'genOta', data '{"fieldControl":0,"fileVersion":4105,"imageType":2053,"manufacturerCode":4742}' from endpoint 1 with groupID 0 Zigbee2MQTT:debug 2024-01-03 11:35:45: Device 'Climate Sensor (2)' requested OTA Zigbee2MQTT:debug 2024-01-03 11:35:45: Responded to OTA request of 'Climate Sensor (2)' with 'NO_IMAGE_AVAILABLE' Zigbee2MQTT:debug 2024-01-03 11:35:48: Received Zigbee message from 'Color Spot (1)', type 'read', cluster 'genTime', data '["time","timeStatus","timeZone","lastSetTime","validUntilTime"]' from endpoint 1 with groupID 0 Zigbee2MQTT:debug 2024-01-03 11:35:58: Received Zigbee message from 'Color Spot (1)', type 'read', cluster 'genTime', data '["time","timeStatus","timeZone","lastSetTime","validUntilTime"]' from endpoint 1 with groupID 0 Error: {"address":0,"clusterId":32817,"sequence":6} after 10000ms at Timeout._onTimeout (/app/node_modules/zigbee-herdsman/src/utils/waitress.ts:64:35) at listOnTimeout (node:internal/timers:569:17) at processTimers (node:internal/timers:512:7)

Just the error: Error: {"address":0,"clusterId":32817,"sequence":6} after 10000ms at Timeout._onTimeout (/app/node_modules/zigbee-herdsman/src/utils/waitress.ts:64:35) at listOnTimeout (node:internal/timers:569:17) at processTimers (node:internal/timers:512:7)

Need to start z2m manually again and then it will crash again and again... Also just tried latest from dev branch (#e9aee4c)

sdotter commented 10 months ago

@Koenkk let me know if i can test something πŸ‘πŸ‘Œ

Koenkk commented 10 months ago

Could you provide the herdsman debug log of the crash?

See this on how to enable the herdsman debug logging. Note that this is only logged to STDOUT and not to log files.

sdotter commented 10 months ago

I've added zigbee_herdsman_debug: true to my config but don't see anything new in docker logs... I think I'm doing something wrong to view herdsman logs?

sdotter commented 10 months ago

Docker logs are stdout also right?

sdotter commented 10 months ago

Happened after generating map...

Error: {"address":40606,"clusterId":32817,"sequence":24} after 10000ms at Timeout._onTimeout (/app/node_modules/zigbee-herdsman/src/utils/waitress.ts:64:35) at listOnTimeout (node:internal/timers:569:17) at processTimers (node:internal/timers:512:7)

merlinpimpim commented 10 months ago

Version 1.35.0 is completely unreliable. I have to restart Z2M every 30 minutes to get my 25 devices to work more or less correctly. An urgent production patch needs to arrive soon.

sdotter commented 10 months ago

I think we both experience same like issues... But I think it will arrive soon haha look at the development that is going on for z2m... Its amazing piece of freeware man πŸ˜‚πŸ‘πŸŽ‰

MiAutomations commented 10 months ago

I think that my problem is related also ...

https://github.com/Koenkk/zigbee2mqtt/issues/20506

Once after the update I'm unable to start the addon, but if I go back with a restore all is working good

sdotter commented 10 months ago

20506

Koenkk commented 10 months ago

@sdotter did you add it in the correct place (HA addon config)?

sdotter commented 10 months ago

yeah i think so:

why does github screws up identation in my comment? πŸ˜’

But ive got in configuration.yaml without leading spaces:

... permit_join: false zigbee_herdsman_debug: true devices: ...

image
sdotter commented 10 months ago

For now i tested and downgraded to: 1.33.2 And everything seems to work a lot faster / more stable (z2m also crashed but lot less) and generating map was much faster

I'll update when i know more...

Zigbee2MQTT versie 1.33.2 commit: unknown CoΓΆrdinator type EZSP v11 Coordinator revisie 7.2.2.0 build 190 Versie frontend 0.6.142 Zigbee-herdsman-converters versie 15.106.0 Zigbee-herdsman versie 0.21.0

sdotter commented 10 months ago

Only difference is 1.33.2 won't crash... Still devices are not responding or map generation is freezing... But if doesn't crash z2m

Ok I found out... Don't use awox devices... Also notices it was mentioned but I just found that article.

That was screwing up my zigbee network, an awox color spot... stupid thing.

Still it doesn't crash z2m 1.33... but it did for 1.35.

How can I check herdsman logs? Or should I just clone z2m repo and build it from scratch? Thanks πŸ™

svarogmaliby commented 10 months ago

Same problem, rolling back to version 1.33.2-1 completely solves it. The problem with the add-on crash starts with version 1.34.0-1 and higher. ff76fd14-f0df-495f-9ef7-ac18d780ecba

chris-1243 commented 10 months ago

@sdotter

If your instance of zigbee2mqtt is on docker, add this line in your docker-compose.yaml in the environment section

DEBUG: zigbee-herdsman*

Restart your container and try to reproduce the error. As soon as you were able to do it, execute: (sudo) docker compose logs zigbee2mqtt > log.txt 2>&1.

For the addon, I can't help at all...

sdotter commented 10 months ago

@chris-1243 yeah thanks ... I'm running addon. But I think I'm gonna clone repo and build it myself and run it standalone like so I can debug more. Thanks for thinking along!! πŸ™πŸ€›πŸ‘

sdotter commented 10 months ago

Same problem, rolling back to version 1.33.2-1 completely solves it. The problem with the add-on crash starts with version 1.34.0-1 and higher. ff76fd14-f0df-495f-9ef7-ac18d780ecba

Yeah that also my experience

Koenkk commented 10 months ago

@sdotter this looks like the z2m configuration.yaml file, it should be the addon config editor (via HA)

pidzi92 commented 10 months ago

Is there a way to rollback to older version in home assistant without need to reconfigure everything?

sdotter commented 10 months ago

@sdotter this looks like the z2m configuration.yaml file, it should be the addon config editor (via HA)

thanks man! ive found it πŸ˜‚ haha... or just edit it as yaml and then add the debug herdsman property thing...

image

But today i tried rcp multipan stuff but there just was an update for Silicon Labs Multiprotocol breaking stuff... so i flashed my stick again with "ncp-uart-sw_7.2.2.0_115200.gbl" and repaired all devices... (not that much work) and everything was working fast as f$#k again... i dont understand it really but yeah... Learned a lot last two days.... NCP RCP (still dont know wich i should prefer?)

@Koenkk can you give some hints how to contribute (and get familiar first) with the z2m and the herdsman stuff?

pidzi92 commented 10 months ago

Is there a way to rollback to older version in home assistant without need to reconfigure everything?

Well, I've found a workaround that works for me. I just had to reset all problematic devices.

DO NOT REMOVE IT FROM HA OR Z2M. Reset device. It should rejoin to its network automatically and it should work, without any reconfiguration. All your automation and entities should stay the same. If it does not rejoin, go into Z2M add-on and permit join.

sdotter commented 10 months ago

set device. It should rejoin to its network automatic

Nice man! thanks i will keep that in mind... yeah for me it started with an really problematic device, an awox device, started to screw things up.... dont use it! haha πŸ˜‚πŸ‘

Koenkk commented 10 months ago

@sdotter are you issues solved now? what part do you want to contribute to?

sdotter commented 10 months ago

@Koenkk i dont know for sure...

Long story short, i had a lot of weird issues with my zigbee network and z2m 1.35... a lot less with 1.33.x ive ended up trying multiprotocol matter and zigbee firmware but that stilll too early to be stable like... so i flashed exactly same fw (ncp-uart-sw_7.3.1.0_115200.gbl) to my zbdonglee and now everything on 1.35 is a lot faster / z2m hasnt crashed since 11 hours ago...

Maybe starting with some frontend? getting a dev env up and running to test and debug zigbee2mqtt, zigbee-herdsman and zigbee-herdsman-converters

Koenkk commented 10 months ago

For developing the frontend the documentation can be found here

larry404 commented 10 months ago

Hi for what it’s worth my zigbee network is working again as one week after it started having issues. I have made no changes to the zigbee config except to updating to the latest HA builds. All devices are working as normal. I’ll keep on monitoring and report back if any issues

flowcool commented 10 months ago

Dear @Koenkk , I changed to dev branch yesterday and until now, no issue. So I assume it's good like this ;)

svarogmaliby commented 10 months ago

Installed update 1.35.1-1, the addon continues to crash on this version. Screenshot_2024-01-08-13-44-13-17_c3a231c25ed346e59462e84656a70e50

Koenkk commented 10 months ago

@svarogmaliby looks like you are using an invalid debounce value for one of the devices in your z2m configuration.yaml.

svarogmaliby commented 10 months ago

@svarogmaliby looks like you are using an invalid debounce value for one of the devices in your z2m configuration.yaml.

No values have been changed; on version 1.33.2-1 the addon works stably

Koenkk commented 10 months ago

The checks seems to be more strict now (previously an invalid config was ignored)

github-actions[bot] commented 4 months ago

This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 30 days