Koenkk / zigbee2mqtt

Zigbee šŸ to MQTT bridge šŸŒ‰, get rid of your proprietary Zigbee bridges šŸ”Ø
https://www.zigbee2mqtt.io
GNU General Public License v3.0
11.74k stars 1.64k forks source link

Z2M stops working after a few days #4307

Closed theFork closed 3 years ago

theFork commented 4 years ago

Abstract:

After some days of operation, my entire Zigbee system fails. No devices report and none can be controlled. Reconnecting the zzh coordinator (CC2652R) fixes the problem for a while.

System:

Version of Zigbee2Mqtt: 1.14.4 (Homeassistant Add-On) Coordinator: zig-a-zig-ah! (CC2652R) Coordinator version: 20200417 Host: NUC with Homeassisant

What happens:

In the Z2M log, I only see lines that look like that: No usual zigbee activity can be seen here:

Zigbee2MQTT:info  2020-09-08 18:24:00: MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"message":[{"dateCode":"20200417","friendly_name":"Coordinator","ieeeAddr":"0x00124b001e17f0bd","lastSeen":1599582240103,"networkAddress":0,"softwareBuildID":"zStack3x0","type":"Coordinator"},{"dateCode":"20180517-DE-FB0","description":"Power switch S2","friendly_name":"s2_licht_kg_werk_flur","hardwareVersion":5,"ieeeAddr":"0x001fee00000030b7","lastSeen":1599475569798,"manufacturerID":4338,"manufacturerName":"ubisys","model":"S2","modelID":"S2 (5502)","networkAddress":58457,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20170718-DE-FB0","description":"Power switch S2","friendly_name":"s2_licht_eg_az_bad","hardwareVersion":5,"ieeeAddr":"0x001fee0000001815","lastSeen":1599475607620,"manufacturerID":4338,"manufacturerName":"ubisys","model":"S2","modelID":"S2 (5502)","networkAddress":58695,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20200609-DE-FB0","description":"Shutter control J1","friendly_name":"j1_rollo_eg_th","hardwareVersion":6,"ieeeAddr":"0x001fee0000003e72","lastSeen":1599475593189,"manufacturerID":4338,"manufacturerName":"ubisys","model":"J1","modelID":"J1 (5502)","networkAddress":45071,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20180920-DE-FB0","description":"Shutter control J1","friendly_name":"j1_rollo_eg_az","hardwareVersion":6,"ieeeAddr":"0x001fee0000002cad","lastSeen":1599475559565,"manufacturerID":4338,"manufacturerName":"ubisys","model":"J1","modelID":"J1 (5502)","networkAddress":26830,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20160516","description":"MiJia temperature & humidity sensor","friendly_name":"xiaq_klima_eg_kueche","hardwareVersion":30,"ieeeAddr":"0x00158d00023e2da5","lastSeen":1593372520673,"manufacturerID":4151,"manufacturerName":"LUMI","model":"WSDCGQ01LM","modelID":"lumi.sens","networkAddress":61795,"powerSource":"Battery","softwareBuildID":"3000-0001","type":"EndDevice","vendor":"Xiaomi"},{"dateCode":"20160516","description":"MiJia temperature & humidity sensor","friendly_name":"xiaq_klima_eg_az","hardwareVersion":30,"ieeeAddr":"0x00158d00023e2cb8","lastSeen":1599474402417,"manufacturerID":4151,"manufacturerName":"LUMI","model":"WSDCGQ01LM","modelID":"lumi.sensor_ht","networkAddress":11114,"powerSource":"Battery","softwareBuildID":"3000-0001","type":"EndDevice","vendor":"Xiaomi"},{"dateCode":"20160516","description":"MiJia temperature & humidity sensor","friendly_name":"xiaq_klima_eg_bad","hardwareVersion":30,"ieeeAddr":"0x00158d00023a3380","lastSeen":1593372772744,"manufacturerID":4151,"manufacturerName":"LUMI","model":"WSDCGQ01LM","modelID":"lumi.sens","networkAddress":44706,"powerSource":"Battery","softwareBuildID":"3000-0001","type":"EndDevice","vendor":"Xiaomi"},{"description":"MiJia temperature & humidity sensor","friendly_name":"xiaq_klima_kg_heiz","ieeeAddr":"0x00158d000208d6f8","lastSeen":1593374764809,"manufacturerID":4151,"manufacturerName":"LUMI","model":"WSDCGQ01LM","modelID":"lumi.sensor_ht","networkAddress":64345,"powerSource":"Battery","type":"EndDevice","vendor":"Xiaomi"},{"dateCode":"20170717-DE-FB0","description":"Power switch S2","friendly_name":"s2_licht_eg_flur_signal","hardwareVersion":5,"ieeeAddr":"0x001fee000000181e","lastSeen":1599475598564,"manufacturerID":4338,"manufacturerName":"ubisys","model":"S2","modelID":"S2 (5502)","networkAddress":10867,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20170522-DE-FB0","description":"Power switch S1","friendly_name":"s1_steckdose_garten","hardwareVersion":6,"ieeeAddr":"0x001fee0000001a7f","lastSeen":1599475557234,"manufacturerID":4338,"manufacturerName":"ubisys","model":"S1","modelID":"S1 (5501)","networkAddress":61115,"powerSource":"Mains (single phase)","softwareBuildID":"","type":"Router","vendor":"Ubisys"},{"dateCode":"20190211-DE-FB0","description":"Control unit C4","friendly_name":"c4_eg_flur","hardwareVersion":3,"ieeeAddr":"0x001fee00000039c6","lastSeen":1599475554788,"manufacturerID":4338,"manufacturerName":"ubisys","model":"C4","modelID":"C4 (5504)","networkAddress":22880,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20180509-DE-FB0","description":"Power switch S2","friendly_name":"s2_licht_eg_th_garten","hardwareVersion":5,"ieeeAddr":"0x001fee0000001850","lastSeen":1599475570280,"manufacturerID":4338,"manufacturerName":"ubisys","model":"S2","modelID":"S2 (5502)","networkAddress":40104,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20191127-DE-FB0","description":"Power switch S2","friendly_name":"s2_licht_og_th","hardwareVersion":5,"ieeeAddr":"0x001fee00000058bf","lastSeen":1599475582971,"manufacturerID":4338,"manufacturerName":"ubisys","model":"S2","modelID":"S2 (5502)","networkAddress":50413,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20191127-DE-FB0","description":"Power switch S2","friendly_name":"s2_iicht_kg_flur_beet","hardwareVersion":5,"ieeeAddr":"0x001fee0000005747","lastSeen":1599475578048,"manufacturerID":4338,"manufacturerName":"ubisys","model":"S2","modelID":"S2 (5502)","networkAddress":62319,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20180522-DE-FB0","description":"Power switch S2","friendly_name":"s2_licht_og_flur","hardwareVersion":5,"ieeeAddr":"0x001fee00000030a7","lastSeen":1599475584236,"manufacturerID":4338,"manufacturerName":"ubisys","model":"S2","modelID":"S2 (5502)","networkAddress":53062,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20170302","description":"TRADFRI LED bulb E12/E14/E17 400 lumen, dimmable warm white, chandelier opal","friendly_name":"cube_tradfri_eg_flur","hardwareVersion":1,"ieeeAddr":"0x000b57fffe99aa9f","lastSeen":1599475595702,"manufacturerID":4476,"manufacturerName":"IKEA of Sweden","model":"LED1649C5","modelID":"TRADFRI bulb E14 W op/ch 400lm","networkAddress":42690,"powerSource":"Mains (single phase)","softwareBuildID":"1.2.214","type":"Router","vendor":"IKEA"},{"dateCode":"20180810-1","description":"E14 candle with white spectrum","friendly_name":"cube_innr_eg_wz_hinten","hardwareVersion":1,"ieeeAddr":"0x00158d00038f2a93","lastSeen":1599475589744,"manufacturerID":4454,"manufacturerName":"innr","model":"RB 248 T","modelID":"RB 248 T","networkAddress":44514,"powerSource":"Mains (single phase)","softwareBuildID":"2.0","type":"Router","vendor":"Innr"},{"dateCode":"20170302","description":"TRADFRI LED bulb E12/E14/E17 400 lumen, dimmable warm white, chandelier opal","friendly_name":"cube_tradfri_eg_kueche","hardwareVersion":1,"ieeeAddr":"0xd0cf5efffe72e953","lastSeen":1599475573845,"manufacturerID":4476,"manufacturerName":"IKEA of Sweden","model":"LED1649C5","modelID":"TRADFRI bulb E14 W op/ch 400lm","networkAddress":5027,"powerSource":"Mains (single phase)","softwareBuildID":"1.2.214","type":"Router","vendor":"IKEA"},{"dateCode":"20180810-1","description":"E14 candle with white spectrum","friendly_name":"cube_innr_eg_wz_vorne","hardwareVersion":1,"ieeeAddr":"0x00158d00038f2a98","lastSeen":1599475562957,"manufacturerID":4454,"manufacturerName":"innr","model":"RB 248 T","modelID":"RB 248 T","networkAddress":43197,"powerSource":"Mains (single phase)","softwareBuildID":"2.0","type":"Router","vendor":"Innr"},{"dateCode":"20170908","description":"Hue white and color ambiance E26/E27/E14","friendly_name":"cube_hue_eg_bad","hardwareVersion":1,"ieeeAddr":"0x00178801041b455f","lastSeen":1599475586682,"manufacturerID":4107,"manufacturerName":"Philips","model":"9290012573A","modelID":"LCT012","networkAddress":24413,"powerSource":"Mains (single phase)","softwareBuildID":"1.29.0_r21169","type":"Router","vendor":"Philips"},{"dateCode":"20170718-DE-FB0","description":"Power switch S2","friendly_name":"s2_licht_eg_bad_spiegelschrank","hardwareVersion":5,"ieeeAddr":"0x001fee00000017fd","lastSeen":1599475609558,"manufacturerID":4338,"manufacturerName":"ubisys","model":"S2","modelID":"S2 (5502)","networkAddress":27907,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20181022-DE-FB0","description":"Shutter control J1","friendly_name":"j1_rollo_eg_bad","hardwareVersion":6,"ieeeAddr":"0x001fee00000038c4","lastSeen":1599475591876,"manufacturerID":4338,"manufacturerName":"ubisys","model":"J1","modelID":"J1 (5502)","networkAddress":63077,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20180920-DE-FB0","description":"Shutter control J1","friendly_name":"j1_rollo_eg_ez","hardwareVersion":6,"ieeeAddr":"0x001fee0000002c65","lastSeen":1599475558914,"manufacturerID":4338,"manufacturerName":"ubisys","model":"J1","modelID":"J1 (5502)","networkAddress":3073,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20170111-DE-FB0","description":"Shutter control J1","friendly_name":"j1_rollo_eg_wz_west","hardwareVersion":5,"ieeeAddr":"0x001fee00000024e5","lastSeen":1599475612398,"manufacturerID":4338,"manufacturerName":"ubisys","model":"J1","modelID":"J1 (5502)","networkAddress":25941,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20170111-DE-FB0","description":"Shutter control J1","friendly_name":"j1_rollo_eg_wz_sued","hardwareVersion":5,"ieeeAddr":"0x001fee0000002517","lastSeen":1599475607114,"manufacturerID":4338,"manufacturerName":"ubisys","model":"J1","modelID":"J1 (5502)","networkAddress":38731,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20170315","description":"TRADFRI LED bulb E14/E26/E27 600 lumen, dimmable, color, opal white","friendly_name":"e27_tradfri_rgb","hardwareVersion":1,"ieeeAddr":"0xd0cf5efffe2926f9","lastSeen":1594040066719,"manufacturerID":4476,"manufacturerName":"IKEA of Sweden","model":"LED1624G9","modelID":"TRADFRI bulb E27 CWS opal 600lm","networkAddress":57311,"powerSource":"Mains (single phase)","softwareBuildID":"1.3.002","type":"Router","vendor":"IKEA"},{"dateCode":"20180921-DE-FB0","description":"Control unit C4","friendly_name":"c4_eg_wz","hardwareVersion":3,"ieeeAddr":"0x001fee0000002a44","lastSeen":1599475564967,"manufacturerID":4338,"manufacturerName":"ubisys","model":"C4","modelID":"C4 (5504)","networkAddress":30018,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20170718-DE-FB0","description":"Power switch S2","friendly_name":"s2_licht_eg_wz","hardwareVersion":5,"ieeeAddr":"0x001fee000000176d","lastSeen":1599475609392,"manufacturerID":4338,"manufacturerName":"ubisys","model":"S2","modelID":"S2 (5502)","networkAddress":46374,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20180813-1","description":"E27 bulb RGBW","friendly_name":"regal_innr_eg_ez","hardwareVersion":1,"ieeeAddr":"0x00158d0002d4a91c","lastSeen":1599475602655,"manufacturerID":4454,"manufacturerName":"innr","model":"RB 285 C","modelID":"RB 285 C","networkAddress":50066,"powerSource":"Mains (single phase)","softwareBuildID":"2.0","type":"Router","vendor":"Innr"},{"dateCode":"20170331","description":"TRADFRI LED bulb E26/E27 980 lumen, dimmable, white spectrum, opal white","friendly_name":"deckenlampe_tradfri_eg_az","hardwareVersion":1,"ieeeAddr":"0x90fd9ffffe8b2149","lastSeen":1599420900408,"manufacturerID":4476,"manufacturerName":"IKEA of Sweden","model":"LED1545G12","modelID":"TRADFRI bulb E27 WS opal 980lm","networkAddress":21288,"powerSource":"Mains (single phase)","softwareBuildID":"1.2.217","type":"Router","vendor":"IKEA"},{"dateCode":"20170214-DE-FB0","description":"Power switch S2","friendly_name":"s2_licht_eg_ez_kueche","hardwareVersion":5,"ieeeAddr":"0x001fee00000017c6","lastSeen":1599475579526,"manufacturerID":4338,"manufacturerName":"ubisys","model":"S2","modelID":"S2 (5502)","networkAddress":62029,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20161129-DE-FB0","description":"Power switch S1","friendly_name":"s1_steckdose_terrasse","hardwareVersion":5,"ieeeAddr":"0x001fee000000210f","lastSeen":1599475553711,"manufacturerID":4338,"manufacturerName":"ubisys","model":"S1","modelID":"S1 (5501)","networkAddress":1030,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20170718-DE-FB0","description":"Power switch S2","friendly_name":"s2_licht_eg_terrasse","hardwareVersion":5,"ieeeAddr":"0x001fee000000182d","lastSeen":1599475561575,"manufacturerID":4338,"manufacturerName":"ubisys","model":"S2","modelID":"S2 (5502)","networkAddress":36181,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20180920-DE-FB0","description":"Shutter control J1","friendly_name":"j1_rollo_og_kind1","hardwareVersion":6,"ieeeAddr":"0x001fee0000002c7d","lastSeen":1599475593863,"manufacturerID":4338,"manufacturerName":"ubisys","model":"J1","modelID":"J1 (5502)","networkAddress":56347,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20181022-DE-FB0","description":"Shutter control J1","friendly_name":"j1_rollo_og_sz","hardwareVersion":6,"ieeeAddr":"0x001fee00000038cb","lastSeen":1599475562210,"manufacturerID":4338,"manufacturerName":"ubisys","model":"J1","modelID":"J1 (5502)","networkAddress":37057,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20180509-DE-FB0","description":"Power switch S2","friendly_name":"s2_licht_og_sz","hardwareVersion":5,"ieeeAddr":"0x001fee0000001791","lastSeen":1596267903613,"manufacturerID":4338,"manufacturerName":"ubisys","model":"S2","modelID":"S2 (5502)","networkAddress":39716,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20190211-DE-FB0","description":"Control unit C4","friendly_name":"c4_og_sz","hardwareVersion":3,"ieeeAddr":"0x001fee00000039c2","lastSeen":1599475587382,"manufacturerID":4338,"manufacturerName":"ubisys","model":"C4","modelID":"C4 (5504)","networkAddress":11041,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20170718-DE-FB0","description":"Power switch S2","friendly_name":"s2_licht_og_kind1","hardwareVersion":5,"ieeeAddr":"0x001fee00000017cd","lastSeen":1599475613511,"manufacturerID":4338,"manufacturerName":"ubisys","model":"S2","modelID":"S2 (5502)","networkAddress":8134,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20190211-DE-FB0","description":"Control unit C4","friendly_name":"c4_og_kind1","hardwareVersion":3,"ieeeAddr":"0x001fee0000003bc1","lastSeen":1599475559596,"manufacturerID":4338,"manufacturerName":"ubisys","model":"C4","modelID":"C4 (5504)","networkAddress":65078,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20190529-DE-FB0","description":"Power switch S1","friendly_name":"s1_steckdose_balkon","hardwareVersion":6,"ieeeAddr":"0x001fee0000004743","lastSeen":1599475569299,"manufacturerID":4338,"manufacturerName":"ubisys","model":"S1","modelID":"S1 (5501)","networkAddress":19463,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"20191127-DE-FB0","description":"Power switch S2","friendly_name":"s2_licht_og_sz_boden","hardwareVersion":5,"ieeeAddr":"0x001fee000000578a","lastSeen":1599475579775,"manufacturerID":4338,"manufacturerName":"ubisys","model":"S2","modelID":"S2 (5502)","networkAddress":11849,"powerSource":"Mains (single phase)","type":"Router","vendor":"Ubisys"},{"dateCode":"","description":"Temperature & humidity sensor with display","friendly_name":"ts0201_klima_og_sz","hardwareVersion":1,"ieeeAddr":"0xec1bbdfffe863c4a","lastSeen":1596899573388,"manufacturerID":4098,"manufacturerName":"_TZ2000_a476raq2","model":"TS0201","modelID":"TS0201","networkAddress":19202,"powerSource":"Battery","type":"EndDevice","vendor":"TuYa"},{"dateCode":"20170302","description":"TRADFRI motion sensor","friendly_name":"motion_tradfri_kg_garage_klein","hardwareVersion":1,"ieeeAddr":"0x000b57fffe99f082","lastSeen":1599435423165,"manufacturerID":4476,"manufacturerName":"IKEA of Sweden","model":"E1525/E1745","modelID":"TRADFRI motion sensor","networkAddress":30546,"powerSource":"Battery","softwareBuildID":"1.2.214","type":"EndDevice","vendor":"IKEA"}],"type":"devices"}'

When I restart Z2M using the Homeassistant supervisor, I see this:

[s6-init] making user provided files available at /var/run/s6/etc...exited 0.
[s6-init] ensuring user provided files have correct perms...exited 0.
[fix-attrs.d] applying ownership & permissions fixes...
[fix-attrs.d] done.
[cont-init.d] executing container initialization scripts...
[cont-init.d] done.
[services.d] starting services
[services.d] done.
./run.sh: line 17: [Info] Configuration backup found in /share/zigbee2mqtt/.configuration.yaml.bk. Skipping config backup.: No such file or directory
[Info] Socat is DISABLED and not started
2020-09-08T18:34:09: PM2 log: Launching in no daemon mode
2020-09-08T18:34:09: PM2 log: App [npm:0] starting in -fork mode-
2020-09-08T18:34:09: PM2 log: App [npm:0] online
> zigbee2mqtt@1.14.4 start /zigbee2mqtt-1.14.4
> node index.js
Zigbee2MQTT:info  2020-09-08 18:34:09: Logging to console and directory: '/share/zigbee2mqtt/log/2020-09-08.18-34-09' filename: log.txt
Zigbee2MQTT:info  2020-09-08 18:34:10: Starting Zigbee2MQTT version 1.14.4 (commit #unknown)
Zigbee2MQTT:info  2020-09-08 18:34:10: Starting zigbee-herdsman...
Zigbee2MQTT:error 2020-09-08 18:34:29: Error while starting zigbee-herdsman
Zigbee2MQTT:error 2020-09-08 18:34:29: Failed to start zigbee
Zigbee2MQTT:error 2020-09-08 18:34:29: Exiting...
Zigbee2MQTT:error 2020-09-08 18:34:29: Error: Failed to connect to the adapter (Error: SRSP - SYS - ping after 6000ms)
    at ZStackAdapter.<anonymous> (/zigbee2mqtt-1.14.4/node_modules/zigbee-herdsman/dist/adapter/z-stack/adapter/zStackAdapter.js:90:31)
    at Generator.throw (<anonymous>)
    at rejected (/zigbee2mqtt-1.14.4/node_modules/zigbee-herdsman/dist/adapter/z-stack/adapter/zStackAdapter.js:25:65)
npm
 ERR! code ELIFECYCLE
npm ERR! errno 1
npm ERR!
 zigbee2mqtt@1.14.4 start: `node index.js`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the zigbee2mqtt@1.14.4 start script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
npm ERR! A complete log of this run can be found in:
npm ERR!     /root/.npm/_logs/2020-09-08T16_34_29_597Z-debug.log
2020-09-08T18:34:30: PM2 log: App [npm:0] exited with code [1] via signal [SIGINT]
2020-09-08T18:34:30: PM2 log: App [npm:0] starting in -fork mode-
2020-09-08T18:34:30: PM2 log: App [npm:0] online
> zigbee2mqtt@1.14.4 start /zigbee2mqtt-1.14.4
> node index.js
Zigbee2MQTT:info  2020-09-08 18:34:30: Logging to console and directory: '/share/zigbee2mqtt/log/2020-09-08.18-34-30' filename: log.txt
Zigbee2MQTT:info  2020-09-08 18:34:31: Starting Zigbee2MQTT version 1.14.4 (commit #unknown)
Zigbee2MQTT:info  2020-09-08 18:34:31: Starting zigbee-herdsman...
Zigbee2MQTT:error 2020-09-08 18:34:50: Error while starting zigbee-herdsman
Zigbee2MQTT:error 2020-09-08 18:34:50: Failed to start zigbee
Zigbee2MQTT:error 2020-09-08 18:34:50: Exiting...
Zigbee2MQTT:error 2020-09-08 18:34:50: Error: Failed to connect to the adapter (Error: SRSP - SYS - ping after 6000ms)
    at ZStackAdapter.<anonymous> (/zigbee2mqtt-1.14.4/node_modules/zigbee-herdsman/dist/adapter/z-stack/adapter/zStackAdapter.js:90:31)
    at Generator.throw (<anonymous>)
    at rejected (/zigbee2mqtt-1.14.4/node_modules/zigbee-herdsman/dist/adapter/z-stack/adapter/zStackAdapter.js:25:65)
npm
 ERR! code ELIFECYCLE
npm ERR! errno 1
npm
 ERR! zigbee2mqtt@1.14.4 start: `node index.js`
npm ERR! Exit status 1
npm ERR! 
npm ERR! Failed at the zigbee2mqtt@1.14.4 start script.
npm ERR! This is probably not a problem with npm. There is likely additional logging output above.
npm ERR! A complete log of this run can be found in:
npm ERR!     /root/.npm/_logs/2020-09-08T16_34_50_695Z-debug.log
2020-09-08T18:34:51: PM2 log: App [npm:0] exited with code [1] via signal [SIGINT]
2020-09-08T18:34:51: PM2 log: App [npm:0] starting in -fork mode-
2020-09-08T18:34:51: PM2 log: App [npm:0] online
> zigbee2mqtt@1.14.4 start /zigbee2mqtt-1.14.4
> node index.js
Zigbee2MQTT:info  2020-09-08 18:34:52: Logging to console and directory: '/share/zigbee2mqtt/log/2020-09-08.18-34-52' filename: log.txt
Zigbee2MQTT:info  2020-09-08 18:34:52: Starting Zigbee2MQTT version 1.14.4 (commit #unknown)
Zigbee2MQTT:info  2020-09-08 18:34:52: Starting zigbee-herdsman...

And so on. This would repeat forever, until I disconnect and reconnect my ZZH coordinator. After this, the network comes up again.

Config

Maybe I should mention, that both features reporting and availability are enabled:

data_path: /share/zigbee2mqtt
devices: devices.yaml
groups: groups.yaml
homeassistant: false
permit_join: false
mqtt:
  base_topic: zigbee2mqtt
...
serial:
  port: /dev/ttyUSB0
advanced:
  rtscts: false
  report: true
  channel: 11
  network_key:
...
  availability_timeout: 60
  availability_blacklist: []
...
ban: []
whitelist: []
queue: {}
socat:
  enabled: false
...

I`m afraid, that this is not enough information to analyze the issue. Is there anything I could fetch when the problem shows up the next time?

middelink commented 3 years ago

@theFork Hmm. Indeed. Does docker not have the stdout logs from the last incarnation running? CLI: docker logs addon_7ad98f9c_zigbee2mqtt might help. I'm just not sure how far back that goes.

Also, with the new hassio supervisor insisting on docker using journalctl, you might extract logs from there with CLI journalctl.

ALaDoffe commented 3 years ago

@Koenkk: Hello, Any news ?

Claude2666 commented 3 years ago

Hi,

I have been running one of the latest dev version 1.14.4-dev (commit #d2f8e22) now for 2.5 days and no more lockups. I do notice some devices having a 'Failed to ping'' and become offline/online more frequent than with the previous version I ran. Also seems to go offline/online at nearly the same time. e,g, info 2020-09-18 13:30:20: MQTT publish: topic 'zigbee2mqtt/Zplug1/availability', payload 'offline' error 2020-09-18 13:30:20: Failed to ping 'Zplug1' info 2020-09-18 13:30:20: MQTT publish: topic 'zigbee2mqtt/Zplug1/availability', payload 'online'

@Koenkk If you need more logs, just let me know and I'll run again with the debug options. Thank you for the great work.

Erickclee commented 3 years ago

Hi,

I have been running one of the latest dev version 1.14.4-dev (commit #d2f8e22) now for 2.5 days and no more lockups. I do notice some devices having a 'Failed to ping'' and become offline/online more frequent than with the previous version I ran. Also seems to go offline/online at nearly the same time. e,g,

Hi @Claude2666 , my situation is exactly the same with dev branch. Running for 2days, no more crash. also some device "fail to ping", is like they go to sleep mode after sometime. When trigger, it take HA to ping them a few time before they response. They do work but with some delay. What I realise is that, I have some switches with Live+Neutral wire ( work as router) and some switches with Live Wire only (node, not as router). And I realise this delay issue only affect those Single Live Wire Switch (node) ..

Anyway, the new Z2M-bridge version has just released, (ver 1.14.4.1) , I tried for 1hour so far, it look stable, didn't notice ping fail so far, all switches quite responsive. You may want to check it out.

I can't provide more log file, because I can only take it from the Z2M GUI, which is very short, not enough for debuging I believe.

Hi, update, don't try 1.14.4.1, it crash after 2 hour, my bad, keep using the dev branch I think....

Koenkk commented 3 years ago

@ALaDoffe do you still have stability issues with the latest dev branch? If so please provide the complete herdsman debug logging from the moment you start Zigbee2MQTT until it crashes.

About the device marked offline/failed to ping: The reason that this happens now is because we don't retry ping when it fails (previously we would try up to 5 times, this is what caused the lockups). Although we can try to ping e.g. 2 times instead of 1 I don't want to implement that right away. If such ping request fail it means something is wrong with your network, just retrying would "hide" this issue for the user. In most cases it is caused by interference or network range issues, most of the times easily fixed by:

After applying such fixes you will see an improvement in the responsiveness of Zigbee2MQTT.

Erickclee commented 3 years ago

Hi @Koenkk , Just seek your advice, aside from the above crash problem. I also experience loosing of network map in Z2M-assisstant after I install Z2M_v1.14.4 . I didn't update my Z2M-assisstant. I don't know where Z2M-assistant fetch the network map information seems like information is lost and nothing show on the network map except for floating devices. Could you advice if this is Z2M issue, or Z2M-assisstant issue or HASSIO issue? image

And dear brother and sister, am I the only one seeing this happen? lost of network map.

theFork commented 3 years ago

My system crashed again last night at about 4AM. Here is the (slightly longish) logfile:

log.zip

The dev-branch was cloned right before starting yesterday at about 09:15AM. Hence my commit id is c0faae73bbd1185c5b06739718360776ae88ab23.

Koenkk commented 3 years ago

@theFork

ALaDoffe commented 3 years ago

@Koenkk

My network crashes after 3 days. I connect to my RPI only with ssh (No KVM and no Graphical Env). So it will be very difficult to get debug log from start to crash.

Regarding the other points : Connecting the CC2531 via a USB extension cable
--> Already the case. Choosing a good channel which does not interfere with WiFi: https://www.metageek.com/training/resources/zigbee-wifi-coexistence.html
--> Wifi Channel : 3, Zigbee Channel: 25 I receive no other wifi from my neighborhood or else. (individual house)

The only strange thing I noticed from the beginning, is the fact I use source routing firmware and see 11 direct connections to my coordinator despite firmware reflash or rebuild network. image image

However, I removed/re-paired some strategical router devices of my network 2 days before, It is better for availability. Much less failed to ping.
But waiting for 3 days if it crashed again or not.

Again very thank you for your help and your great job.

theFork commented 3 years ago

@Koenkk

... are these devices actually offline?

No, or at least not all of them. These devices are currently offline:

I will now try what happens after a few days if I disable availability.

HarrisonPace commented 3 years ago

My Network crashed as well on latest dev branch commit. Availability disabled.

Koenkk commented 3 years ago

@thehaxxa please provide the herdsman debug logging from starting z2m till the crash, otherwise I cannot help.

HarrisonPace commented 3 years ago

@thehaxxa please provide the herdsman debug logging from starting z2m till the crash, otherwise I cannot help.

I have started logging, I will post them up after it crashes šŸ‘

UPDATE: I removed availability_timeout: 0 on the latest dev branch, and so far it has run for 19 hours without crashing. I will monitor it for the next week with debugging enabled, but so far looks good.

theFork commented 3 years ago

Still on c0faae7 with availability_timeout: 0, my system crashed again.

I also noticed some delays when turning on/off lights via mqtt (homeassistant). One of the delays can be seen in the log starting at 2020-09-22 22:22:56 when I try to turn off the first channel of the channels of s2_licht_eg_ez_kueche. After hitting the button in homeassistant it took several seconds until the light was turned off. Maybe another issue, but maybe also interesting.

Here is the entire log: log.zip

Are there commits that might be relevant for this issue after c0faae7? If so, I will update before continuing the observation.

UPDATE: It appears as if the system DID NOT CRASH. Or it repaired itself. I just re-checked before restarting and now everything works as expected.

Erickclee commented 3 years ago

@theFork , my problem similar. you will realise lagging for those non-router device (battery operated or single wire switch). no lagging for router devices. Only happen in this recent revision 1.14.4. sorry that i cant get log, my log is very short as i can only get it from Addon UI.

theFork commented 3 years ago

@Erickclee I can't confirm that lagging only happens with non-router devices. Most of my devices are mains-powered router devices such as the s2_licht_eg_ez_kueche.

With the kind help of @middelink I figured out a way to get the full logs when running in HA:

  1. Install SSH und web terminal addon
  2. Deactivate protection mode (not sure if necessary maybe first try without that)
  3. Define an alias that get the docker container ID of Z2M (enter this every time you start a terminal session or add it to you .zshrc:
    alias z2m_container_id="docker ps | grep dwelch2101/zigbee2mqtt | cut -d ' ' -f 1"
  4. Write the logs to a file:
    docker logs `docker_container_id` &> log.log
  5. Copy the file to your computer (for instance via SCP) and compress before adding to an issue.

@Koenkk What do you think, should we add something like that to the howto section?

Claude2666 commented 3 years ago

I experience similar behavior. With the latest 1.14.4-dev it does not crash, but switches and some sensors don't respond, or respond with delay after a couple of days running. It happens to both battery powered devices or power plugs like Osram or GLEDOPTO led drivers. I went back to my 14.4.2-dev snapshot, and all responds responds without delay and no lag, even after several days.

Erickclee commented 3 years ago

Is there a way to downgrade from v1.14.4 without a Snapshot? (so sad, I didn't keep a Snapsot). I am absolutely sure that I didn't have these problem before 1.14.4. Also, it affects the Z2Massisstant as well. lost the networkmap and lqi information. 1.14.4dev is quite stable, but many device will go to sleep mode sometime, a lot of lag. Most of the time they will wake up after the Z2M ping them a few more time. It will still crash, but it last much longer like 3days or more.

I just recently figure that, if you play with the Z2Massistant, start-restart-uninstall-install, it will crash the Z2M too. (I did that because I try to get back the network map in Z2Massisstant)

Thanks @theFork for your instruction to get the log. But unfortunately, I am still a beginer user, very difficult for me to understand how to do that and where to key in those code you mentioned, appreciate your help though.

Koenkk commented 3 years ago

@theFork I've checked your log and there is indeed no crash (as far as the log shows). The lag you observed at 2020-09-22T20:22:55.968Z is caused by the device not responding to the ON command send on 2020-09-22T20:22:55.968Z. Therefore the OFF command send at 2020-09-22 22:22:56 is postponed queued because Zigbee2MQTT retries to send the ON command. I've noticed this because of;

2020-09-22T20:23:05.983Z zigbee-herdsman:adapter:zStack:adapter Response timeout (0x001fee00000017c6:62029,0)
domoticafacilconjota commented 3 years ago

Is there a way to downgrade from v1.14.4 without a Snapshot? (so sad, I didn't keep a Snapsot). I am absolutely sure that I didn't have these problem before 1.14.4. Also, it affects the Z2Massisstant as well. lost the networkmap and lqi information. 1.14.4dev is quite stable, but many device will go to sleep mode sometime, a lot of lag. Most of the time they will wake up after the Z2M ping them a few more time. It will still crash, but it last much longer like 3days or more.

I just recently figure that, if you play with the Z2Massistant, start-restart-uninstall-install, it will crash the Z2M too. (I did that because I try to get back the network map in Z2Massisstant)

Thanks @theFork for your instruction to get the log. But unfortunately, I am still a beginer user, very difficult for me to understand how to do that and where to key in those code you mentioned, appreciate your help though.

Hello! @Erickclee I created a temporary repository with version 1.13.1.1 if anyone is interested, here is the link: https://github.com/domoticafacilconjota/hassio-zigbee2mqtt-1.13.1.1

It is an exact copy therefore it is necessary to uninstall the current version of Z2M. Delete danielwelch's repository and add my own. Don't forget to take a HA snapshot and copy your addon configuration (you will need it later).

theFork commented 3 years ago

@Erickclee @domoticafacilconjota @rafhaanshah : I would prefer that we concentrate on helping @Koenkk and the other developers in fixing the issue in the latest release rather than spending time on fixing our own home automation.

@Koenkk : With availability disabled, my system (still running c0faae7) is now functional for one week. Would it be worthwhile to update to the latest commit and enable availability again? Or would you expect that availability is still broken.

messismore commented 3 years ago

For what it's worth, my system has been stable since updating to 64a7840, even with availability turned back on. Before, devices would be unresponsive after a few hours.

theFork commented 3 years ago

Great news @messismore! I will then update and enable availability.

Edit: 2020-09-26 13:35:05 Now running fced7ed with availability enabled (with one device in passlist)

Koenkk commented 3 years ago

@theFork yes availability should be fine now

bruvv commented 3 years ago

Is this merged into the edge version for the hassio addon by daniel? Since a few days after adding a lot more zigbee devices I'm also experiencing this issue.

Koenkk commented 3 years ago

@bruvv yes it's in the latest edge addon, what adapter are you using and how many devices do you have in total?

bruvv commented 3 years ago

@bruvv yes it's in the latest edge addon, what adapter are you using and how many devices do you have in total?

CC2351 and using 24 devices, including hue lights (10), osram smart plugs(3), xiaomi smart plugs(2) and xiaomi sensors(9). CC2351 firmware: Z-Stack_Home_1.2 (CC2531_SOURCE_ROUTING_20190619)

Anything I can do to help, perhaps reflash the firmware ? I have a CC debugger handy and ready to use.

Koenkk commented 3 years ago

@bruvv could you provide the herdsman debug logging of the crash?

To enable herdsman debug logging, see https://www.zigbee2mqtt.io/information/debug.html#zigbee-herdsman-debug-logging

bruvv commented 3 years ago

Got a debug:

Zigbee2MQTT:debug 2020-09-29 21:36:57: Publishing 'set' 'state' to '0x0017880108d1ba67'
Zigbee2MQTT:error 2020-09-29 21:37:02: Publish 'set' 'state' to '0x0017880108cac886' failed: 'Error: Command 0x0017880108cac886/11 genLevelCtrl.moveToLevelWithOnOff({"level":254,"transtime":300}, {"timeout":10000,"disableResponse":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null}) failed (Data request failed with error: 'MAC no resources' (26))'
Zigbee2MQTT:debug 2020-09-29 21:37:02: Error: Command 0x0017880108cac886/11 genLevelCtrl.moveToLevelWithOnOff({"level":254,"transtime":300}, {"timeout":10000,"disableResponse":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null}) failed (Data request failed with error: 'MAC no resources' (26))
    at ZStackAdapter.<anonymous> (/zigbee2mqtt-1.14.4/node_modules/zigbee-herdsman/dist/adapter/z-stack/adapter/zStackAdapter.js:311:27)
    at Generator.next (<anonymous>)
    at fulfilled (/zigbee2mqtt-1.14.4/node_modules/zigbee-herdsman/dist/adapter/z-stack/adapter/zStackAdapter.js:24:58)
Zigbee2MQTT:info  2020-09-29 21:37:02: MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"message":"Publish 'set' 'state' to '0x0017880108cac886' failed: 'Error: Command 0x0017880108cac886/11 genLevelCtrl.moveToLevelWithOnOff({\"level\":254,\"transtime\":300}, {\"timeout\":10000,\"disableResponse\":false,\"disableDefaultResponse\":false,\"direction\":0,\"srcEndpoint\":null,\"reservedBits\":0,\"manufacturerCode\":null,\"transactionSequenceNumber\":null}) failed (Data request failed with error: 'MAC no resources' (26))'","meta":{"friendly_name":"0x0017880108cac886"},"type":"zigbee_publish_error"}'
Zigbee2MQTT:debug 2020-09-29 21:37:02: Publishing 'set' 'transition' to '0x0017880108cac886'
Zigbee2MQTT:error 2020-09-29 21:37:03: Publish 'set' 'state' to '0x0017880108cad490' failed: 'Error: Command 0x0017880108cad490/11 genLevelCtrl.moveToLevelWithOnOff({"level":254,"transtime":300}, {"timeout":10000,"disableResponse":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null}) failed (Data request failed with error: 'MAC no resources' (26))'
Zigbee2MQTT:debug 2020-09-29 21:37:03: Error: Command 0x0017880108cad490/11 genLevelCtrl.moveToLevelWithOnOff({"level":254,"transtime":300}, {"timeout":10000,"disableResponse":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null}) failed (Data request failed with error: 'MAC no resources' (26))
    at ZStackAdapter.<anonymous> (/zigbee2mqtt-1.14.4/node_modules/zigbee-herdsman/dist/adapter/z-stack/adapter/zStackAdapter.js:311:27)
    at Generator.next (<anonymous>)
    at fulfilled (/zigbee2mqtt-1.14.4/node_modules/zigbee-herdsman/dist/adapter/z-stack/adapter/zStackAdapter.js:24:58)
Koenkk commented 3 years ago

@bruvv thanks, could you provide me more of this logging? Preferably from Zigbee2MQTT start tills it fails.

bruvv commented 3 years ago

You mean the whole log file? log.txt

Koenkk commented 3 years ago

@bruvv you mentioned using the 20190619 source routing firmware but your log shows your are using the default 20190608 firmware:

info  2020-09-29 17:14:16: Coordinator firmware version: '{"meta":{"maintrel":3,"majorrel":2,"minorrel":6,"product":0,"revision":20190608,"transportrev":2},"type":"zStack12"}'

Try switching to the source routing firmware, it has more memory available. https://github.com/Koenkk/Z-Stack-firmware/tree/master/coordinator/Z-Stack_Home_1.2/bin/source_routing

bruvv commented 3 years ago

That is weird. I have a few of them laying around so probably mixed up the usb's. Changed to source routing and it's stable now. Thanks @Koenkk

theFork commented 3 years ago

My system is now stable for 5 days. Since the master branch has incremented to 1.15.0, I would suggest, that we all update to that version, turn on heavy pinging (availability_timeout =10s) and logging, wait one more week and then close this one? Who's in?

Koenkk commented 3 years ago

@theFork I still wouldn't recommend an availability timeout of 10, in my opinion 60 is a better value (but it depends on the amount of devices in your network/use case).

Erickclee commented 3 years ago

My system is now stable for 5 days. Since the master branch has incremented to 1.15.0, I would suggest, that we all update to that version, turn on heavy pinging (availability_timeout =10s) and logging, wait one more week and then close this one? Who's in?

Bad idea, i install 1.15.0 base on your input, it crashed within 3hours. I didnt add any availability setting. I believe the code in dev-branch is not yet implimented into bridge ver. I am switching back to dev-branch, at least it was stable for more than 5days for me too.

Koenkk commented 3 years ago

@Erickclee that is very strange, dev branch and 1.15.0 are currently equal to each other. Please herdsman debug logging from start till crash.

To enable herdsman debug logging, see https://www.zigbee2mqtt.io/information/debug.html#zigbee-herdsman-debug-logging

Erickclee commented 3 years ago

Hi @Koenkk , thanks for reply. I was using dev-branch, installed like 5days ago, so this might not be the same as current dev-branch. dev-branch doesn't seems to have version number so I cannot quote. Previous version of dev-brance do not have the webui sidebar (similar to Z2MA) so I am sure my tested version is different, without sidebar.

2ndly, when I try 1.15.0, I was playing with the Webui side bar, When I click the network map, it shows error message that "the adapter is disconnectd" means it crash the 2351 adaptor and all devices turn offline. Z2M manage to reset itself. Then it ran smoothly for 3 hours and it crash totally, it cant resume until I do a cold reboot for my Pi4.

So now I am back to dev-branch, this time it became the newer version with webui sidebar. Again I play with the sidebar on network map, again it crash the adapter, and able to recover by itself. Since you mention that it is identical to 1.15.0, I am afraid it might later crash too....I am still running for less than 2hours.

Maybe I should go back to 1.15.0 bridge, and stop playing with the WebUI sidebar for network map.

Sorry that I cannot get herdsman log long enough, because I can only copy and paste from the Addon UI, which is too short for analysis.

Update: I am switching back to 1.15.0 bridge again, this time I refrain myself from playing with the WebUI Sidebar, especially the network MAP.

Update: I tried 2diff firmware, default 20190608 and source routing 20190619. and i switched back to 1.15.0 bridge. I have around 20devices, many are routers. when i use 20190608, switch response are similar to dev_branch (although not as responsive as 1.13), but cannot touch the webgui for network map, it will crash immediately when you play with the map... when i use 20190619, response are very sluggish, switches dont response immediately, but i can load the network map in the wbgui sidebar without crashing the Z2M.

now i am back to default 20190608. pretty ok now, still monitoring...except that i am not allowed to touch the network map.

update. 6hour has passed, 1.15.0 bridge is still very stable with default firmware 20190608 provided that i dont touch the UI sidebar on generating network MAP.

Erickclee commented 3 years ago

1.15.0 bridge is stabe for 2days. Verify 2 times, sidebar map function will crash the Z2M coordinator. i use 20190608 default firmware with CC2531, using Pi4, standard HASSIO addon.

Koenkk commented 3 years ago

@Erickclee looks that the networkmap function crashes the stick. This can occur because the network map is a very heavy operation. Can you try with the source routing firmware? It provides more stability: https://github.com/Koenkk/Z-Stack-firmware/tree/master/coordinator/Z-Stack_Home_1.2/bin/source_routing

Erickclee commented 3 years ago

@Koenkk, your suggestion works. I tried source routing 20190619. It works now with the map. Seem like it need some time to work through the network, (or maybe it need some time to link up all devices) because when it just booted up, the switches are all very sluggish, and with a lot of lagging. But after 10minutes or so, everything work fine, it is more responsive than the default firmware 20190608. Now everything is good. Will continue to monitor with the source routing firmware.

Day2 Update: 20190619 source routing become non-responsive on day2, like it goes into deep sleep....Since I use a different dongle, it could be the dongle issue, not a good comparison. image

I change back to my original dongle wth default firmware 20190608. (I just don't use the Map functinon) I try not to flash source routing into this dongle because I try to keep one dongle in default firmware that works for me (at least stable without the map). image

RossBille commented 3 years ago

~30 devices (~10 routers) for me on a CC2531. I noticed crashes about twice a day on 1.14.4. Updating to 1.15.0 and flashing source routing 20190619 fixed the issue for me. 2 days without a crash so far.

theFork commented 3 years ago

On Oct 1st I updated to 1.15.0, enabled debug and herdsman debug logging. I've got about 45 devices (~40 routers). Even if @Koenkk recommended to use 60s availability timeout, I used a 30s interval to ping all of them. So far, no system crash. To be really sure I will now wait for one more week and then close the issue. Thanks for your great work, everyone!

Armageddit commented 3 years ago

@aBaDDoNNL

hi pls update the manual https://www.zigbee2mqtt.io/information/flashing_the_cc2531.html

I have just also massive problems and stumbled over this by chance i have followed the instructions and therefore the wrong version on my stick i'm not sure yet if it's because of Docker or the wrong firmware or if the import of the data is not possible

Koenkk commented 3 years ago

@Armageddit done

theFork commented 3 years ago

It's still running, now for 14 days. I think we can be pretty sure that this is fixed now. Thanks for your great work everyone, it's been a pleasure supporting this investigation :bow:

radix50 commented 2 years ago

I see something very like this with a zzh when a device that has been connected to the network is removed physically, but is still configured - even if I am not controlling it anymore. z2m runs for many weeks OK, but when a device is turned off, it just stops after 2 - 3 days. Nothing in the log files suggesting any issues. This is on Windows Server 2019 Standard on a Intel NUC PC