Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge 🌉, get rid of your proprietary Zigbee bridges 🔨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
11.79k stars 1.64k forks source link

Crashes once a day since 1.21.0 #8452

Closed illivanillie closed 2 years ago

illivanillie commented 3 years ago

What happened

I have massive problems with zigbee2mqtt. zigbee2mqtt once a day fails and is unable to start again. „Failed to start zigbee-herdsman“ I can only overcome this by stopping HA and pull the plug for a few seconds. Then zigbee is able to start again. Seems like it is an adapter problem, but just reconnecting the adapter won’t work.

What did you expect to happen

zigbee2mqtt to keep working or at least be able to start again without physical interaction, because it is in a remote spot.

How to reproduce it (minimal and precise)

was not able to determine any specific cause. Randomly, about once a day. The problem last for about 2 weeks. So it could correlate with 1.21.0 or/and with the new adapter firmware

Debug info

Zigbee2MQTT version: latest Adapter hardware: CC26X2R1 Adapter firmware version: latest, reflashed several times.

Koenkk commented 3 years ago

Please provide the herdsman debug logging from starting z2m till the crash.

See https://www.zigbee2mqtt.io/information/debug.html#zigbee-herdsman-debug-logging on how to enable the herdsman debug logging. Note that this is only logged to STDOUT and not to log files.

illivanillie commented 3 years ago

Should I update to 1.21.0-4 before the log?

Koenkk commented 3 years ago

That doesn't matter.

illivanillie commented 3 years ago

By chance I was able to copy this. I got a notification that a few lights were unavailable for a few times (=zigbee2mqtt was not available). Maybe you can read something. There was no crash so far again. Tbh I am still trying to figure out, how and where exactly the herdsman logs are to find. Could you help me out?

2021-08-23T15:45:00.354Z zigbee-herdsman:adapter:zStack:znp:SRSP <-- ZDO - simpleDescReq - {"status":0} 2021-08-23T15:45:00.355Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,14] 2021-08-23T15:45:00.356Z zigbee-herdsman:adapter:zStack:unpi:parser <-- [69,132,0,0,0,0,0,8,6,9,1,5,0,0,0,0,204] 2021-08-23T15:45:00.356Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,14,69,132,0,0,0,0,0,8,6,9,1,5,0,0,0,0,204] 2021-08-23T15:45:00.356Z zigbee-herdsman:adapter:zStack:unpi:parser --> parsed 14 - 2 - 5 - 132 - [0,0,0,0,0,8,6,9,1,5,0,0,0,0] - 204 2021-08-23T15:45:00.356Z zigbee-herdsman:adapter:zStack:znp:AREQ <-- ZDO - simpleDescRsp - {"srcaddr":0,"status":0,"nwkaddr":0,"len":8,"endpoint":6,"profileid":265,"deviceid":5,"deviceversion":0,"numinclusters":0,"inclusterlist":[],"numoutclusters":0,"outclusterlist":[]} 2021-08-23T15:45:00.357Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [] 2021-08-23T15:45:00.357Z zigbee-herdsman:adapter:zStack:znp:SREQ --> ZDO - simpleDescReq - {"dstaddr":0,"nwkaddrofinterest":0,"endpoint":5} 2021-08-23T15:45:00.358Z zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,5,37,4,0,0,0,0,5,33] 2021-08-23T15:45:00.363Z zigbee-herdsman:adapter:zStack:unpi:parser <-- [254,1,101,4,0,96,254,14] 2021-08-23T15:45:00.363Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,1,101,4,0,96,254,14] 2021-08-23T15:45:00.363Z zigbee-herdsman:adapter:zStack:unpi:parser --> parsed 1 - 3 - 5 - 4 - [0] - 96 2021-08-23T15:45:00.363Z zigbee-herdsman:adapter:zStack:znp:SRSP <-- ZDO - simpleDescReq - {"status":0} 2021-08-23T15:45:00.363Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,14] 2021-08-23T15:45:00.364Z zigbee-herdsman:adapter:zStack:unpi:parser <-- [69,132,0,0,0,0,0,8,5,8,1,5,0,0,0,0,206] 2021-08-23T15:45:00.365Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,14,69,132,0,0,0,0,0,8,5,8,1,5,0,0,0,0,206] 2021-08-23T15:45:00.365Z zigbee-herdsman:adapter:zStack:unpi:parser --> parsed 14 - 2 - 5 - 132 - [0,0,0,0,0,8,5,8,1,5,0,0,0,0] - 206 2021-08-23T15:45:00.365Z zigbee-herdsman:adapter:zStack:znp:AREQ <-- ZDO - simpleDescRsp - {"srcaddr":0,"status":0,"nwkaddr":0,"len":8,"endpoint":5,"profileid":264,"deviceid":5,"deviceversion":0,"numinclusters":0,"inclusterlist":[],"numoutclusters":0,"outclusterlist":[]} 2021-08-23T15:45:00.366Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [] 2021-08-23T15:45:00.366Z zigbee-herdsman:adapter:zStack:znp:SREQ --> ZDO - simpleDescReq - {"dstaddr":0,"nwkaddrofinterest":0,"endpoint":4} 2021-08-23T15:45:00.367Z zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,5,37,4,0,0,0,0,4,32] 2021-08-23T15:45:00.371Z zigbee-herdsman:adapter:zStack:unpi:parser <-- [254,1,101,4,0,96,254,14] 2021-08-23T15:45:00.372Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,1,101,4,0,96,254,14] 2021-08-23T15:45:00.372Z zigbee-herdsman:adapter:zStack:unpi:parser --> parsed 1 - 3 - 5 - 4 - [0] - 96 2021-08-23T15:45:00.372Z zigbee-herdsman:adapter:zStack:znp:SRSP <-- ZDO - simpleDescReq - {"status":0} 2021-08-23T15:45:00.372Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,14] 2021-08-23T15:45:00.373Z zigbee-herdsman:adapter:zStack:unpi:parser <-- [69,132,0,0,0,0,0,8,4,7,1,5,0,0,0,0,192] 2021-08-23T15:45:00.374Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,14,69,132,0,0,0,0,0,8,4,7,1,5,0,0,0,0,192] 2021-08-23T15:45:00.374Z zigbee-herdsman:adapter:zStack:unpi:parser --> parsed 14 - 2 - 5 - 132 - [0,0,0,0,0,8,4,7,1,5,0,0,0,0] - 192 2021-08-23T15:45:00.374Z zigbee-herdsman:adapter:zStack:znp:AREQ <-- ZDO - simpleDescRsp - {"srcaddr":0,"status":0,"nwkaddr":0,"len":8,"endpoint":4,"profileid":263,"deviceid":5,"deviceversion":0,"numinclusters":0,"inclusterlist":[],"numoutclusters":0,"outclusterlist":[]} 2021-08-23T15:45:00.374Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [] 2021-08-23T15:45:00.375Z zigbee-herdsman:adapter:zStack:znp:SREQ --> ZDO - simpleDescReq - {"dstaddr":0,"nwkaddrofinterest":0,"endpoint":3} 2021-08-23T15:45:00.375Z zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,5,37,4,0,0,0,0,3,39] 2021-08-23T15:45:00.381Z zigbee-herdsman:adapter:zStack:unpi:parser <-- [254,1,101,4,0,96,254,14] 2021-08-23T15:45:00.381Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,1,101,4,0,96,254,14] 2021-08-23T15:45:00.381Z zigbee-herdsman:adapter:zStack:unpi:parser --> parsed 1 - 3 - 5 - 4 - [0] - 96 2021-08-23T15:45:00.381Z zigbee-herdsman:adapter:zStack:znp:SRSP <-- ZDO - simpleDescReq - {"status":0} 2021-08-23T15:45:00.381Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,14] 2021-08-23T15:45:00.382Z zigbee-herdsman:adapter:zStack:unpi:parser <-- [69,132,0,0,0,0,0,8,3,4,1,5,0,0,0,0,196] 2021-08-23T15:45:00.383Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,14,69,132,0,0,0,0,0,8,3,4,1,5,0,0,0,0,196] 2021-08-23T15:45:00.383Z zigbee-herdsman:adapter:zStack:unpi:parser --> parsed 14 - 2 - 5 - 132 - [0,0,0,0,0,8,3,4,1,5,0,0,0,0] - 196 2021-08-23T15:45:00.383Z zigbee-herdsman:adapter:zStack:znp:AREQ <-- ZDO - simpleDescRsp - {"srcaddr":0,"status":0,"nwkaddr":0,"len":8,"endpoint":3,"profileid":260,"deviceid":5,"deviceversion":0,"numinclusters":0,"inclusterlist":[],"numoutclusters":0,"outclusterlist":[]} 2021-08-23T15:45:00.383Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [] 2021-08-23T15:45:00.384Z zigbee-herdsman:adapter:zStack:znp:SREQ --> ZDO - simpleDescReq - {"dstaddr":0,"nwkaddrofinterest":0,"endpoint":2} 2021-08-23T15:45:00.384Z zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,5,37,4,0,0,0,0,2,38] 2021-08-23T15:45:00.401Z zigbee-herdsman:adapter:zStack:unpi:parser <-- [254,1,101,4,0,96,254,14] 2021-08-23T15:45:00.401Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,1,101,4,0,96,254,14] 2021-08-23T15:45:00.402Z zigbee-herdsman:adapter:zStack:unpi:parser --> parsed 1 - 3 - 5 - 4 - [0] - 96 2021-08-23T15:45:00.402Z zigbee-herdsman:adapter:zStack:znp:SRSP <-- ZDO - simpleDescReq - {"status":0} 2021-08-23T15:45:00.403Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,14] 2021-08-23T15:45:00.404Z zigbee-herdsman:adapter:zStack:unpi:parser <-- [69,132,0,0,0,0,0,8,2,1,1,5,0,0,0,0,192] 2021-08-23T15:45:00.405Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,14,69,132,0,0,0,0,0,8,2,1,1,5,0,0,0,0,192] 2021-08-23T15:45:00.405Z zigbee-herdsman:adapter:zStack:unpi:parser --> parsed 14 - 2 - 5 - 132 - [0,0,0,0,0,8,2,1,1,5,0,0,0,0] - 192 2021-08-23T15:45:00.406Z zigbee-herdsman:adapter:zStack:znp:AREQ <-- ZDO - simpleDescRsp - {"srcaddr":0,"status":0,"nwkaddr":0,"len":8,"endpoint":2,"profileid":257,"deviceid":5,"deviceversion":0,"numinclusters":0,"inclusterlist":[],"numoutclusters":0,"outclusterlist":[]} 2021-08-23T15:45:00.407Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [] 2021-08-23T15:45:00.407Z zigbee-herdsman:adapter:zStack:znp:SREQ --> ZDO - simpleDescReq - {"dstaddr":0,"nwkaddrofinterest":0,"endpoint":1} 2021-08-23T15:45:00.408Z zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,5,37,4,0,0,0,0,1,37] 2021-08-23T15:45:00.414Z zigbee-herdsman:adapter:zStack:unpi:parser <-- [254,1,101,4,0,96,254,14] 2021-08-23T15:45:00.414Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,1,101,4,0,96,254,14] 2021-08-23T15:45:00.414Z zigbee-herdsman:adapter:zStack:unpi:parser --> parsed 1 - 3 - 5 - 4 - [0] - 96 2021-08-23T15:45:00.415Z zigbee-herdsman:adapter:zStack:znp:SRSP <-- ZDO - simpleDescReq - {"status":0} 2021-08-23T15:45:00.415Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,14] 2021-08-23T15:45:00.416Z zigbee-herdsman:adapter:zStack:unpi:parser <-- [69,132,0,0,0,0,0,8,1,4,1,5,0,0,0,0,198] 2021-08-23T15:45:00.417Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,14,69,132,0,0,0,0,0,8,1,4,1,5,0,0,0,0,198] 2021-08-23T15:45:00.417Z zigbee-herdsman:adapter:zStack:unpi:parser --> parsed 14 - 2 - 5 - 132 - [0,0,0,0,0,8,1,4,1,5,0,0,0,0] - 198 2021-08-23T15:45:00.418Z zigbee-herdsman:adapter:zStack:znp:AREQ <-- ZDO - simpleDescRsp - {"srcaddr":0,"status":0,"nwkaddr":0,"len":8,"endpoint":1,"profileid":260,"deviceid":5,"deviceversion":0,"numinclusters":0,"inclusterlist":[],"numoutclusters":0,"outclusterlist":[]} 2021-08-23T15:45:00.418Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [] 2021-08-23T15:45:00.431Z zigbee-herdsman:adapter:zStack:znp:SREQ --> ZDO - extNwkInfo - {} 2021-08-23T15:45:00.432Z zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,0,37,80,117] 2021-08-23T15:45:00.435Z zigbee-herdsman:adapter:zStack:unpi:parser <-- [254,24,101,80,0,0,9,98] 2021-08-23T15:45:00.435Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,24,101,80,0,0,9,98] 2021-08-23T15:45:00.436Z zigbee-herdsman:adapter:zStack:unpi:parser <-- [26,0,0,221,221,221,221,221] 2021-08-23T15:45:00.436Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,24,101,80,0,0,9,98,26,0,0,221,221,221,221,221] 2021-08-23T15:45:00.437Z zigbee-herdsman:adapter:zStack:unpi:parser <-- [221,221,221,0,0,0,0,0,0,0,0,11,87] 2021-08-23T15:45:00.438Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,24,101,80,0,0,9,98,26,0,0,221,221,221,221,221,221,221,221,0,0,0,0,0,0,0,0,11,87] 2021-08-23T15:45:00.438Z zigbee-herdsman:adapter:zStack:unpi:parser --> parsed 24 - 3 - 5 - 80 - [0,0,9,98,26,0,0,221,221,221,221,221,221,221,221,0,0,0,0,0,0,0,0,11] - 87 2021-08-23T15:45:00.438Z zigbee-herdsman:adapter:zStack:znp:SRSP <-- ZDO - extNwkInfo - {"shortaddr":0,"devstate":9,"panid":6754,"parentaddr":0,"extendedpanid":"0xdddddddddddddddd","parentextaddr":"0x0000000000000000","channel":11} 2021-08-23T15:45:00.439Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [] 2021-08-23T15:45:00.478Z zigbee-herdsman:controller:log Disable joining 2021-08-23T15:45:00.493Z zigbee-herdsman:adapter:zStack:znp:SREQ --> UTIL - ledControl - {"ledid":3,"mode":0} 2021-08-23T15:45:00.494Z zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,2,39,10,3,0,44] 2021-08-23T15:45:00.556Z zigbee-herdsman:adapter:zStack:unpi:parser <-- [254,1,103,10,0,108] 2021-08-23T15:45:00.556Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,1,103,10,0,108] 2021-08-23T15:45:00.556Z zigbee-herdsman:adapter:zStack:unpi:parser --> parsed 1 - 3 - 7 - 10 - [0] - 108 2021-08-23T15:45:00.556Z zigbee-herdsman:adapter:zStack:znp:SRSP <-- UTIL - ledControl - {"status":0} 2021-08-23T15:45:00.556Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [] 2021-08-23T15:45:00.557Z zigbee-herdsman:adapter:zStack:znp:SREQ --> AF - dataRequestExt - {"dstaddrmode":2,"dstaddr":"0x000000000000fffd","destendpoint":242,"dstpanid":0,"srcendpoint":242,"clusterid":33,"transid":1,"options":0,"radius":30,"len":6,"data":{"type":"Buffer","data":[25,2,2,11,0,0]}} 2021-08-23T15:45:00.559Z zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,26,36,2,2,253,255,0,0,0,0,0,0,242,0,0,242,33,0,1,0,30,6,0,25,2,2,11,0,0,22] 2021-08-23T15:45:00.584Z zigbee-herdsman:adapter:zStack:unpi:parser <-- [254,1,100,2,0,103] 2021-08-23T15:45:00.585Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,1,100,2,0,103] 2021-08-23T15:45:00.585Z zigbee-herdsman:adapter:zStack:unpi:parser --> parsed 1 - 3 - 4 - 2 - [0] - 103 2021-08-23T15:45:00.585Z zigbee-herdsman:adapter:zStack:znp:SRSP <-- AF - dataRequestExt - {"status":0} 2021-08-23T15:45:00.586Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [] 2021-08-23T15:45:00.789Z zigbee-herdsman:adapter:zStack:znp:SREQ --> ZDO - mgmtPermitJoinReq - {"addrmode":15,"dstaddr":65532,"duration":0,"tcsignificance":0} 2021-08-23T15:45:00.790Z zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,5,37,54,15,252,255,0,0,26] 2021-08-23T15:45:00.828Z zigbee-herdsman:adapter:zStack:unpi:parser <-- [254,3,68,128,0,242,1,52] 2021-08-23T15:45:00.829Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,3,68,128,0,242,1,52] 2021-08-23T15:45:00.829Z zigbee-herdsman:adapter:zStack:unpi:parser --> parsed 3 - 2 - 4 - 128 - [0,242,1] - 52 2021-08-23T15:45:00.830Z zigbee-herdsman:adapter:zStack:znp:AREQ <-- AF - dataConfirm - {"status":0,"endpoint":242,"transid":1} 2021-08-23T15:45:00.830Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [] 2021-08-23T15:45:00.842Z zigbee-herdsman:adapter:zStack:unpi:parser <-- [254,1,101,54,0,82,254,3] 2021-08-23T15:45:00.843Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,1,101,54,0,82,254,3] 2021-08-23T15:45:00.843Z zigbee-herdsman:adapter:zStack:unpi:parser --> parsed 1 - 3 - 5 - 54 - [0] - 82 2021-08-23T15:45:00.843Z zigbee-herdsman:adapter:zStack:znp:SRSP <-- ZDO - mgmtPermitJoinReq - {"status":0} 2021-08-23T15:45:00.843Z zigbee-herdsman:adapter:zStack:unpi:parser --- parseNext [254,3]

Koenkk commented 3 years ago

I don't see any error in that log. Use docker logs zigbee2mqtt > log.txt 2>&1 to grab all logs, next time please post the logs on pastebin.com

jmyoung commented 3 years ago

I'm having the same issue. CC2530+CC2591, using the firmware from 20190608 (haven't been bothered to update it yet), and using the latest zigbee2mqtt Docker image, the coordinator works for exactly 24 hours, and I mean exactly - to the minute. After that it just doesn't do anything else until I power cycle the adapter (which involves shutting down the rpi it's on, of course).

No useful logs unfortunately :/

goliath888 commented 3 years ago

I was gonna open an issue but I think I have the same problem. I am using docker image 1.21.0 with the CC2530.

As @jmyoung mentioned, exactly 24 hours after start the container, logs has no more update of any device. If I try to turn on any light, I get this message:

error 2021-08-28 21:09:43: Publish 'set' 'state' to 'hallway_front' failed: 'Error: Command 0x588e81fffec9ca45/1 genOnOff.on({}, {"sendWhenActive":false,"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (SRSP - AF - dataRequest after 6000ms)'

If I try to reboot the container, It won't be able to do it with this error message:

Using '/app/data' as data directory Zigbee2MQTT:info 2021-08-28 21:25:50: Logging to console and directory: '/app/data/log/2021-08-28.21-25-50' filename: log.txt Zigbee2MQTT:info 2021-08-28 21:25:50: Starting Zigbee2MQTT version 1.21.0 (commit #70891ee) Zigbee2MQTT:info 2021-08-28 21:25:50: Starting zigbee-herdsman (0.13.124) Zigbee2MQTT:error 2021-08-28 21:26:10: Error while starting zigbee-herdsman Zigbee2MQTT:error 2021-08-28 21:26:10: Failed to start zigbee Zigbee2MQTT:error 2021-08-28 21:26:10: Check https://www.zigbee2mqtt.io/information/FAQ.html#help-zigbee2mqtt-fails-to-start for possible solutions Zigbee2MQTT:error 2021-08-28 21:26:10: Exiting... Zigbee2MQTT:error 2021-08-28 21:26:10: Error: Failed to connect to the adapter (Error: SRSP - SYS - ping after 6000ms) at ZStackAdapter.start (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/zStackAdapter.ts:102:27) at Controller.start (/app/node_modules/zigbee-herdsman/src/controller/controller.ts:123:29) at Zigbee.start (/app/lib/zigbee.js:67:27) at Controller.start (/app/lib/controller.js:106:27) at start (/app/index.js:94:5) Using '/app/data' as data directory

I have to unplug the CC2530 and plug it again. Then everything works fine...for another 24 hours.

I go back to 1.20 and it goes perfect for days with no issues.

Is there anything i can help with?

gionni12 commented 3 years ago

hi.idem

jmyoung commented 3 years ago

I have to unplug the CC2530 and plug it again. Then everything works fine...for another 24 hours.

I go back to 1.20 and it goes perfect for days with no issues.

In this vein, I rolled back to my old verison of zigbee2mqtt, which was pulled 15 months ago (yeah), and everything works fine again. This problem seems to be specific to most recent versions. Since @goliath888 has had success with 1.20, I'll try pulling 1.20.0 and see how that goes.

gionni12 commented 3 years ago

good morning..I don't know the method and I don't have a backup of my home assistant. can someone tell me how to go back? Thank you

YassineBenayad commented 3 years ago

Hello everybody, I have the same issue since I migrated to z2m 1.21. I'm using a CC2530+CC2591 connected to a vmware host using a Silicon Labs CP2102 USB to UART Bridge Controller, the same setup works perfectly well if I down grade to version 1.18-1 or 1.17.0-1 (Home Assistant Add-on) from some pervious spanshots of my VM.

here is how the error look like:

[10:53:19] INFO: Handing over control to Zigbee2mqtt Core ... zigbee2mqtt@1.21.0 start node index.js Zigbee2MQTT:error 2021-08-30 10:53:48: Error while starting zigbee-herdsman Zigbee2MQTT:error 2021-08-30 10:53:48: Failed to start zigbee Zigbee2MQTT:error 2021-08-30 10:53:48: Check https://www.zigbee2mqtt.io/information/FAQ.html#help-zigbee2mqtt-fails-to-start for possible solutions Zigbee2MQTT:error 2021-08-30 10:53:48: Exiting... Zigbee2MQTT:error 2021-08-30 10:53:48: Error: Failed to connect to the adapter (Error: SRSP - SYS - ping after 6000ms) at ZStackAdapter.start (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/zStackAdapter.ts:102:27) at Controller.start (/app/node_modules/zigbee-herdsman/src/controller/controller.ts:123:29) at Zigbee.start (/app/lib/zigbee.js:67:27) at Controller.start (/app/lib/controller.js:106:27) at start (/app/index.js:94:5)

Westcott1 commented 3 years ago

I'm also using 1.21-dev and CC2530+CC2591. After exactly 24 hours all devices steadily go offline with "failed to ping xxx" It all works OK for another 24 hours if I stop z2m, un- and re-plug the stick, then restart z2m.

illivanillie commented 3 years ago

For me it was ok for a few days. Just happened again and had a quick look. See logs here

Edit: It was now the second time that a restart of the HA host solved the issue. I thought before that it was a hardware problem, but I guess it is the host that crashes and thus makes hardware unavailable? In addition nabucasa did not work for me, which is also host related. I will wait for a third time to test the host method and will report here.

illivanillie commented 3 years ago

For me the issue is definitely solved (for the moment), if I restart the HA host. Any ideas on how to dig deeper into this?

ascanio65 commented 3 years ago

hi.for me desolution was a downgrade at version 1.20

Koenkk commented 3 years ago

Can you try again with the latest-dev? (https://www.zigbee2mqtt.io/how_tos/how-to-switch-to-dev-branch.html)

jmyoung commented 3 years ago

Swapped over now, we shall see in 24 hours. I'll report back with findings :)

Westcott1 commented 3 years ago

I'm sorry, it didn't fix it for me.

jmyoung commented 3 years ago

Didn't fix it for me. Stopped responding after 24 hours again.

Koenkk commented 3 years ago

@Westcott1 @jmyoung to debug this further I will need the herdsman debug logging

See https://www.zigbee2mqtt.io/information/debug.html#zigbee-herdsman-debug-logging on how to enable the herdsman debug logging. Note that this is only logged to STDOUT and not to log files.

Westcott1 commented 3 years ago

Started with - DEBUG=zigbee-herdsman* npm start Now wait 24 hours...

I seem to remember from the days I did UNIX as a day job that stdout can be redirected into a file, e.g. DEBUG=zigbee-herdsman* npm start 1> herdsman.log

Update - this does not capture all data. Perhaps another stream is used as well?

jmyoung commented 3 years ago

Mine too, just set it up, I see herdsman logging output in docker logs. So now we wait.

Westcott1 commented 3 years ago

All devices dutifully went offline after 24 hours exactly. Hopefully I captured the right bits in the attached file. herdsman.log

The stick always has to be un- and re-plugged after this happens.

Koenkk commented 3 years ago

@Westcott1 the log you provided is very short, please provide the log starting from the point where you start z2m until it fails.

Westcott1 commented 3 years ago

Sorry I don't know how to do that! Since it can't be redirected into a file, the PuTTY window doesn't have a large enough buffer to hold the entire 24 hours of data. I saw nothing untoward up to the point where everything went offline. If you like, I can capture the start also.

colin19889 commented 3 years ago

I'm having the exact same 24-hour problem with Hassio + RPi4 + CC2530+2591 + CP2102. Going to update to 1.21.1-1 and I'll capture the logs and post them in about 24-hours.

jmyoung commented 3 years ago

Got a full captured log for you, from start to finish. There's... a lot of stuff in it, I have quite a few temperature sensors...

herdsman.log.gz

What immediately strikes me is the last message in the log is;

zigbee2mqtt_1  | 2021-09-08T00:43:34.004Z zigbee-herdsman:controller:log Wrote coordinator backup to '/app/data/coordinator_backup.json'

Presumably that task runs after exactly 24 hours of uptime?

Westcott1 commented 3 years ago

Hi Jmyoung, How did you manage to capture the full-size log? I saw that coordinator message, but guessed that it was part of the shutdown process.

jmyoung commented 3 years ago

I'm using Docker Compose, so I just did docker-compose logs > foo.txt.

That coordinator message appeared before I shut down the container, and notably it also happened exactly (well, to within a minute) 24 hours after z2m was started.

Koenkk commented 3 years ago

@jmyoung do I understand correctly that z2m just freezes completely after the line below?

zigbee2mqtt_1  | 2021-09-08T00:43:34.004Z zigbee-herdsman:controller:log Wrote coordinator backup to '/app/data/coordinator_backup.json'

What is logged when you try to control a bulb after this? Would be good if you could also include the z2m logging itself, you can do this with: (replace the CONTAINER_ID ofc)

docker logs CONTAINER_ID > log.txt 2>&1
Westcott1 commented 3 years ago

Mine ended with this - It doesn't get as far as 'Wrote coordinator backup' Perhaps I stopped z2m too soon?

Zigbee2MQTT:error 2021-09-07 10:29:25: Failed to ping 'Dimmer_1'
  zigbee-herdsman:adapter:zStack:znp:SREQ --> AF - dataRequestExt - {"dstaddrmode":2,"dstaddr":"0x000000000000fffd","destendpoint":242,"dstpanid":0,"srcendpoint":242,"clusterid":33,"transid":53,"options":0,"radius":30,"len":6,"data":{"type":"Buffer","data":[25,245,2,11,0,0]}} +6s
  zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,26,36,2,2,253,255,0,0,0,0,0,0,242,0,0,242,33,0,53,0,30,6,0,25,245,2,11,0,0,213] +6s
  zigbee-herdsman:adapter:zStack:znp:SREQ --> SYS - osalNvLength - {"id":513} +6s
  zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,2,33,19,1,2,51] +6s
  zigbee-herdsman:controller:log Creating coordinator backup +57s
  zigbee-herdsman:adapter:zStack:startup:backup creating backup +24s
  zigbee-herdsman:adapter:zStack:znp:SREQ --> SYS - version - {} +6s
  zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,0,33,2,35] +6s
  zigbee-herdsman:controller:endpoint Read 0x00124b001c467b6c/11 genBasic(["zclVersion"], {"sendWhenActive":false,"timeout":10000,"disableResponse":false,"disableRecovery":true,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) +1m
  zigbee-herdsman:adapter:zStack:adapter sendZclFrameToEndpointInternal 0x00124b001c467b6c:61298/11 (0,0,1) +38s
^C^C  zigbee-herdsman:adapter:zStack:znp:SREQ --> SYS - osalNvLength - {"id":513} +6s
  zigbee-herdsman:adapter:zStack:unpi:writer --> frame [254,2,33,19,1,2,51] +6s
Zigbee2MQTT:error 2021-09-07 10:29:49: Failed to stop Zigbee2MQTT
jmyoung commented 3 years ago

@jmyoung do I understand correctly that z2m just freezes completely after the line below?

In my case, yes - but I also didn't just leave it sitting there for a long time. I configured an automation in HomeAssistant to yell at me via Pushover when it hasn't heard from anything in 10 minutes, so I got to it pretty quick when it failed.

What is logged when you try to control a bulb after this?

I don't have any controllable devices set up at the moment. However, after the failure, I get no more updates from anything - no temperature sensors or anything else.

Would be good if you could also include the z2m logging itself, you can do this with: (replace the CONTAINER_ID ofc)

I already had the logging set to 'warn' for z2m itself as well as herdsman. I'll pull it down in the morning and restart it all with everything on debug - I expect there will be a lot of output, and I'll have to wait the 24 hours.

jmyoung commented 3 years ago

Full log attached, z2m was in debug logging mode, herdsman logging enabled. As before, the last line written by Herdsman to the log was about the coordinator backup, followed by a few state updates from z2m, and that's all she wrote.

zigbee2mqtt_1  | 2021-09-11T06:23:59.410Z zigbee-herdsman:adapter:zStack:startup:backup fetched adapter network security material table
zigbee2mqtt_1  | 2021-09-11T06:23:59.436Z zigbee-herdsman:controller:log Wrote coordinator backup to '/app/data/coordinator_backup.json'
zigbee2mqtt_1  | Zigbee2MQTT:debug 2021-09-11 15:57:26: Saving state to file /app/data/state.json
zigbee2mqtt_1  | Zigbee2MQTT:debug 2021-09-11 16:02:26: Saving state to file /app/data/state.json
zigbee2mqtt_1  | Zigbee2MQTT:debug 2021-09-11 16:07:26: Saving state to file /app/data/state.json

full-log.txt.gz

Koenkk commented 3 years ago

@jmyoung could you try and see if the issue still happens when using the source routing firmware? https://github.com/Koenkk/Z-Stack-firmware/tree/master/coordinator/Z-Stack_Home_1.2/bin/source_routing

Westcott1 commented 3 years ago

I switched to a zzh CC2652R coordinator (from a CC2630) 2 days ago, and it has run OK since then.

jmyoung commented 3 years ago

@jmyoung could you try and see if the issue still happens when using the source routing firmware? https://github.com/Koenkk/Z-Stack-firmware/tree/master/coordinator/Z-Stack_Home_1.2/bin/source_routing

Sorry, but I can't change the firmware on my coordinator to that. I have 15 devices, however I'm not able to get most of my devices within range of at least one separate router (I do have two other routers, but they typically only wind up with 1-2 devices on them due to their location).

I'll just stick with 1.20.0 instead. That works.

igogold commented 3 years ago

I have the same issue with 24 hours lifetime of working z2m and after that I should reset adapter to make it working again. But I have a way to send reset command to the adapter without rebooting my device (I'm unable to replug adapter). I'm using CC2530+CC2592 module for Russian Wiren Board smart home controller based on Ebyte module. The firmware is a latest one (non source route), 21 devices total (2 routers).

Is it possible to disable (via config) creation of coordinator backup at all? It also need more time to stop z2m service with a backup (about 90 seconds now), not only hang coordinator after 24 hours.

colin19889 commented 3 years ago

But I have a way to send reset command to the adapter without rebooting my device (I'm unable to replug adapter).

@igogold - How are you resetting the adapter without rebooting? I send a reboot command at 0300 everyday to avoid the problem but resetting the adapter would be better.

igogold commented 3 years ago

@colin19889 At developer support board I found the adapter module has reset line connected to gpio of wiren board controller. And it's possible to change state of gpio using linux interface via sysfs. There is a post (in Russian). I think a presence of such reset line depends on a way the module was created and connected to main device.

I'm resetting an adapter every time before start zigbee2mqtt.service as it hang just after stopping the service (after creating a coordinator backup?). So, to restart z2m I need reset adapter anyway.

colin19889 commented 2 years ago

Has there been any progress made on this issue? I'm still scheduling a 3am restart but the reboot occasionally causes issues with my battery operated devices meaning I have to manually reset them.

Koenkk commented 2 years ago

Not yet, @castorw is checking this

castorw commented 2 years ago

Okay so let's give a quick update. I have tried reproducing this issue on CC2531 running Z-Stack 1.2. So far without success. From what I read in this post this seems to affect CC2530. CC2531 should be almost the same as CC2530 except it has a built-in USB controller. However there might be other differences in the IC design.

Unfortunately I have no CC2530 device on hand so I am still trying to reproduce this on CC2531. Upgraded one of my prod nwks to affected z2m version 1.21.2.

Please try updating firmware! This issue resembled signs of memory leak, so the only sensible advice I am able to give right now: please try updating your radios to the latest firmware available. Please let me know if this helps.

colin19889 commented 2 years ago

Thanks for looking into it. I just received a couple new CC2530s that I was going to use for routers but I will upgrade one of them to the latest coordinator FW and report back.

Koenkk commented 2 years ago

~20211018_backup_fix.zip~

UPDATE use fws from https://github.com/Koenkk/zigbee2mqtt/issues/8452#issuecomment-946473353 instead

Could you guys check if the following fw fixes it? Zip contains one for the CC2530 and CC2530+CC2591

colin19889 commented 2 years ago

I've got about 3-hours left on my test from yesterday with the latest FW. Once I confirm that crashes/succeeds in a few hours I will try your new FW. Thanks!

igogold commented 2 years ago

Zip contains one for the CC2530 and CC2530+CC2591

Thank you! But can you also build CC2530+CC2592 default firmware so I can try it too as my CC2530 adapter comes with CC2592 amp?

Koenkk commented 2 years ago

Instead of https://github.com/Koenkk/zigbee2mqtt/issues/8452#issuecomment-946077251, please test with these fws: (also contains the CC2592 fw).

CC2530_firmwares.zip

igogold commented 2 years ago

I've flashed new firmware (cc2530+cc2592) and now my controller works as before! It doesn't hang anymore after stopping z2m service (so also after doing a backup). And backup doesn't wait ~90 sec as before, it just finishes immediately. I will monitor logs after 24 and 48 hours but it seems an issue has been fixed, thank you!

snakuzzo commented 2 years ago

Here same result... my CC2530 with new firmware (CC2530 w/o CC2591/92) is working fine using latest stable docker image

colin19889 commented 2 years ago

So it seems that new FW has stopped the daily crash/lockup, so that's great. It seems that ever since the FW change though I've been having some intermittent issues with devices timing out when trying to control them. Sometimes everything works fine but sometimes I get timeouts or delays of 8 or 9 seconds controlling. Overall everything just seems sluggish.

In terms of routers I have 1x Ikea Bulb and 1x CC2530.

I've powered down the RPi for 15 mins to put all of the Zigbee devices into "panic mode" so they connect to their closest routers but that hasn't seemed to help.

Thoughts?

EDIT -

Example of the the timeouts

Zigbee2MQTT:error 2021-10-20 16:23:42: Publish 'set' 'state' to 'Desk Light' failed: 'Error: Command 0xb0ce181403174115/1 genOnOff.on({}, {"sendWhenActive":false,"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 30582 - 1 - 28 - 6 - 11 after 10000ms)'