Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge 🌉, get rid of your proprietary Zigbee bridges 🔨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
11.73k stars 1.64k forks source link

1.39.0 random crashes #23282

Open Ricc68 opened 2 months ago

Ricc68 commented 2 months ago

What happened?

Everything was working fine on 1.38.0-dev but since I switched to 1.39.0, z2m started crashing randomly every couple of hours. The log shows a random crash started with a transaction failed error followed by a crash related to herdsman followed by a tentative restart that crashed again on herdsman.

What did you expect to happen?

No crashes.

How to reproduce it (minimal and precise)

It started happening with ZBDongle-E, fw 7.4.3 and z2m 1.39.0 on ember.

Zigbee2MQTT version

1.39.0

Adapter firmware version

7.4.3

Adapter

zbdongle-e

Setup

HAOS in vm

Debug log

2024-07-06 16:58:15] info: zh:ember: [NCP COUNTERS] 418,478,137,24,0,0,14,14,96,6,0,0,60,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 [2024-07-06 16:58:15] info: zh:ember: [ASH COUNTERS] 219,153,7,146,0,0,0,0,4545,146,146,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0 [2024-07-06 17:58:30] error: zh:ember:ezsp: ERROR Transaction failure; status=undefined. Last: [FRAME: ID=101:"READ_AND_CLEAR_COUNTERS" Seq=228 Len=5]. [2024-07-06 17:58:30] error: zh:ember: !!! NCP FATAL ERROR reason=undefined. ATTEMPTING RESET... !!! [2024-07-06 17:58:30] info: zh:ember:queue: Request dispatching stopped; queue=1 priorityQueue=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: ASH COUNTERS since last clear: [2024-07-06 17:58:30] info: zh:ember:uart:ash: Total frames: RX=23, TX=25 [2024-07-06 17:58:30] info: zh:ember:uart:ash: Cancelled : RX=0, TX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: DATA frames : RX=23, TX=2 [2024-07-06 17:58:30] info: zh:ember:uart:ash: DATA bytes : RX=734, TX=32 [2024-07-06 17:58:30] info: zh:ember:uart:ash: Retry frames: RX=0, TX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: ACK frames : RX=0, TX=23 [2024-07-06 17:58:30] info: zh:ember:uart:ash: NAK frames : RX=0, TX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: nRdy frames : RX=0, TX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: CRC errors : RX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: Comm errors : RX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: Length < minimum: RX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: Length > maximum: RX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: Bad controls : RX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: Bad lengths : RX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: Bad ACK numbers : RX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: Out of buffers : RX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: Retry dupes : RX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: Out of sequence : RX=0 [2024-07-06 17:58:30] info: zh:ember:uart:ash: ACK timeouts : RX=0 Error: at Ezsp.ezspReadAndClearCounters (/app/node_modules/zigbee-herdsman/src/adapter/ember/ezsp/ezsp.ts:2012:19) at Object.func (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/emberAdapter.ts:844:42) at EmberRequestQueue.dispatch (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/requestQueue.ts:117:46) [21:18:30] INFO: Preparing to start... [21:18:30] INFO: Socat not enabled [21:18:32] INFO: Starting Zigbee2MQTT... Starting Zigbee2MQTT without watchdog. [2024-07-07 21:18:40] info: z2m: Logging to console, file (filename: log.log) [2024-07-07 21:18:40] info: z2m: Starting Zigbee2MQTT version 1.39.0 (commit #unknown) [2024-07-07 21:18:40] info: z2m: Starting zigbee-herdsman (0.50.1) [2024-07-07 21:18:41] info: zh:ember: Using default stack config. [2024-07-07 21:18:41] info: zh:ember: ======== Ember Adapter Starting ======== [2024-07-07 21:18:41] info: zh:ember:ezsp: ======== EZSP starting ======== [2024-07-07 21:18:41] info: zh:ember:uart:ash: ======== ASH NCP reset ======== [2024-07-07 21:18:41] info: zh:ember:uart:ash: RTS/CTS config is off, enabling software flow control. [2024-07-07 21:18:41] info: zh:ember:uart:ash: Serial port opened [2024-07-07 21:18:41] info: zh:ember:uart:ash: ======== ASH starting ======== [2024-07-07 21:18:44] info: zh:ember:uart:ash: ======== ASH NCP reset ======== [2024-07-07 21:18:44] info: zh:ember:uart:ash: ======== ASH starting ======== [2024-07-07 21:18:46] info: zh:ember:uart:ash: ======== ASH NCP reset ======== [2024-07-07 21:18:46] info: zh:ember:uart:ash: ======== ASH starting ======== [2024-07-07 21:18:49] info: zh:ember:uart:ash: ======== ASH NCP reset ======== [2024-07-07 21:18:49] info: zh:ember:uart:ash: ======== ASH starting ======== [2024-07-07 21:18:51] info: zh:ember:uart:ash: ======== ASH NCP reset ======== [2024-07-07 21:18:51] info: zh:ember:uart:ash: ======== ASH starting ======== [2024-07-07 21:18:54] error: z2m: Error while starting zigbee-herdsman [2024-07-07 21:18:54] error: z2m: Failed to start zigbee [2024-07-07 21:18:54] error: z2m: Check https://www.zigbee2mqtt.io/guide/installation/20_zigbee2mqtt-fails-to-start.html for possible solutions [2024-07-07 21:18:54] error: z2m: Exiting... [2024-07-07 21:18:54] error: z2m: Error: Failed to start EZSP layer with status=HOST_FATAL_ERROR. at EmberAdapter.initEzsp (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/emberAdapter.ts:889:19) at EmberAdapter.start (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/emberAdapter.ts:2722:24) at Controller.start (/app/node_modules/zigbee-herdsman/src/controller/controller.ts:129:29) at Zigbee.start (/app/lib/zigbee.ts:63:27) at Controller.start (/app/lib/controller.ts:139:27) at start (/app/index.js:154:5)

Ricc68 commented 2 months ago

Update: after a restart of the VM, z2m started properly. I have now enabled the debug log to catch more info about the issue.

maciekdnd commented 1 month ago

I had similar issue but on SLZB-06M, latest firmware (updated from github repo to 7..4.3 instead from controller console which is probably 7.4.1, on Z2M 1.38.0). After that I got random crashes especially on LXC Z2M reboot. Moving to 20240507 FW (7.4.1) seems more stable.

Nerivec commented 1 month ago

@Ricc68 Did the restart definitely fix this problem? If not, can you check your last log from running 1.38.0-dev for the commit hash? Example: z2m: Starting Zigbee2MQTT version 1.39.0-dev (commit #cda8927d) It will point us to one of these commits to see what's changed since.

As you can see from June 1st to June 30th in that list, 1.39.0 had only minimal changes for ember. August release (current dev) will be a big one however, it has a lot of fixes from adding support for v8, and the implementation of concurrency (should smooth things out for users with misbehaving devices).

Ricc68 commented 1 month ago

@Ricc68 Did the restart definitely fix this problem? If not, can you check your last log from running 1.38.0-dev for the commit hash? Example: z2m: Starting Zigbee2MQTT version 1.39.0-dev (commit #cda8927d) It will point us to one of these commits to see what's changed since.

As you can see from June 1st to June 30th in that list, 1.39.0 had only minimal changes for ember. August release (current dev) will be a big one however, it has a lot of fixes from adding support for v8, and the implementation of concurrency (should smooth things out for users with misbehaving devices).

@Nerivec Since I have restarted and enabled the debug log, the problem disappeared. Now I have disabled the debug: let's see if it appears again. My feeling is that it was a malfunction of the serial communication with the dongle because a simple restart without power down of HAOS was fixing it.

swissjon commented 1 month ago

I am having a similar problem. While my devices are quickly available after a reboot by mqtt, the front end can take up to 1 minute to load. I just downgraded to 1.37 and the front end load withing 5-10 seconds of starting.

I have 2 instances in docker with POE coordinators, about 100 devices on each. Same problem with both instances.

Nerivec commented 1 month ago

@swissjon That issue should already be fixed in dev/edge (caused by a quirk in Availability unearthed after fixing the startup flow in last release).

Ricc68 commented 1 month ago

@Nerivec With log at info level, z2m crashed in less than 24h:

[2024-07-22 21:38:53] error: zh:ember:ezsp: ERROR Transaction failure; status=undefined. Last: [FRAME: ID=40:"GET_NETWORK_PARAMETERS" Seq=120 Len=5]. [2024-07-22 21:38:53] error: zh:ember: !!! NCP FATAL ERROR reason=undefined. ATTEMPTING RESET... !!! [2024-07-22 21:38:53] info: zh:ember:queue: Request dispatching stopped; queue=1 priorityQueue=1 [2024-07-22 21:38:53] info: zh:ember:uart:ash: ASH COUNTERS since last clear: [2024-07-22 21:38:53] info: zh:ember:uart:ash: Total frames: RX=204, TX=249 [2024-07-22 21:38:53] info: zh:ember:uart:ash: Cancelled : RX=0, TX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: DATA frames : RX=204, TX=46 [2024-07-22 21:38:53] info: zh:ember:uart:ash: DATA bytes : RX=5137, TX=1225 [2024-07-22 21:38:53] info: zh:ember:uart:ash: Retry frames: RX=0, TX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: ACK frames : RX=0, TX=203 [2024-07-22 21:38:53] info: zh:ember:uart:ash: NAK frames : RX=0, TX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: nRdy frames : RX=0, TX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: CRC errors : RX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: Comm errors : RX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: Length < minimum: RX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: Length > maximum: RX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: Bad controls : RX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: Bad lengths : RX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: Bad ACK numbers : RX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: Out of buffers : RX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: Retry dupes : RX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: Out of sequence : RX=0 [2024-07-22 21:38:53] info: zh:ember:uart:ash: ACK timeouts : RX=0 Error: at Ezsp.ezspGetNetworkParameters (/app/node_modules/zigbee-herdsman/src/adapter/ember/ezsp/ezsp.ts:2659:19) at Object.func (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/emberAdapter.ts:2791:55) at EmberRequestQueue.dispatch (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/requestQueue.ts:117:46)

Attempting to restart z2m results again in a crash as if the communication with the dongle is somehow stuck:

[2024-07-23 21:49:10] error: z2m: Error while starting zigbee-herdsman [2024-07-23 21:49:10] error: z2m: Failed to start zigbee [2024-07-23 21:49:10] error: z2m: Check https://www.zigbee2mqtt.io/guide/installation/20_zigbee2mqtt-fails-to-start.html for possible solutions [2024-07-23 21:49:10] error: z2m: Exiting... [2024-07-23 21:49:11] error: z2m: Error: Failed to start EZSP layer with status=HOST_FATAL_ERROR. at EmberAdapter.initEzsp (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/emberAdapter.ts:889:19) at EmberAdapter.start (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/emberAdapter.ts:2722:24) at Controller.start (/app/node_modules/zigbee-herdsman/src/controller/controller.ts:129:29) at Zigbee.start (/app/lib/zigbee.ts:63:27) at Controller.start (/app/lib/controller.ts:139:27) at start (/app/index.js:154:5)

A soft reboot of the HAOS VM, without removing power to the dongle, fixed the issue.

Nerivec commented 1 month ago

Would need debug logs to confirm, but from what I'm seeing, I'd say the adapter is disconnecting abruptly (or at least that's what Z2M seems to perceive). Can you check dmesg to see if anything weird is happening with the USBs?

swissjon commented 1 month ago

@swissjon That issue should already be fixed in dev/edge (caused by a quirk in Availability unearthed after fixing the startup flow in last release).

Thanks. Now that my setup is back to stable, I will probably just wait for the next stable release.

Ricc68 commented 1 month ago

@Nerivec I have checked the host dmesg and there is no sign of usb problems at the time of the crash. (Two days ago) I have only found two usb resets compatible with the first restart attempt and the second successful VM restart of yesterday.

Would need debug logs to confirm, but from what I'm seeing, I'd say the adapter is disconnecting abruptly (or at least that's what Z2M seems to perceive). Can you check dmesg to see if anything weird is happening with the USBs?

@Nerivec I have checked the host dmesg and there is no sign of usb problems at the time of the crash. (Two days ago) I have only found two usb resets compatible with the first restart attempt and the second successful VM restart of yesterday.

Nerivec commented 1 month ago

You may want to check other available logs too, like the HA ones. In case something interesting pops up around the time of the crash.

You switched back from edge, so, no chance the edge instance is trying to start again (watchdog) or anything along these lines, that could interfere with the regular one? It's a bit far-fetched, and shouldn't, but just in case...

Make sure to set the Z2M log level to debug in case it happens again, so we can get more details about the error (the stuff that happens right before what you posted above).

You could also try the edge add-on again (to update it, just uninstall it, go back to the store, check for updates -top right corner dropdown-, then install it again). See if it happens there too.

Ricc68 commented 1 month ago

@Nerivec it happened again. I have checked host and VM dmesg and HA logs but nothing happened to the usb stick, the logs are not showing anything related. No, the edge z2m was uninstalled before installing the release version. I don't enable the watchdog. I will re enable the debug log of z2m and see if it happens but in my previous try with debug log enabled it did not happen for weeks and with debug log disabled it happens in less than 24 hours so my guess is that maybe is a timing issue that is mitigated by the debug log.

Ricc68 commented 1 month ago

@Nerivec I've got the crash with debug enabled finally.

[2024-07-28 23:53:00] debug: zh:controller:database: Writing database to '/config/zigbee2mqtt/database.db' [2024-07-28 23:53:00] debug: zh:ember:ezsp: ===> [FRAME: ID=101:"READ_AND_CLEAR_COUNTERS" Seq=76 Len=5] [2024-07-28 23:53:00] debug: zh:ember:uart:ash: ---> [FRAME type=DATA frmTx=4 frmRx=1] [2024-07-28 23:53:14] debug: zh:ember:ezsp: =x=> [FRAME: ID=101:"READ_AND_CLEAR_COUNTERS" Seq=76 Len=5] Error: Error: timed out after 14400ms [2024-07-28 23:53:14] error: zh:ember:ezsp: ERROR Transaction failure; status=undefined. Last: [FRAME: ID=101:"READ_AND_CLEAR_COUNTERS" Seq=76 Len=5]. [2024-07-28 23:53:14] error: zh:ember: !!! NCP FATAL ERROR reason=undefined. ATTEMPTING RESET... !!! [2024-07-28 23:53:14] info: zh:ember:queue: Request dispatching stopped; queue=1 priorityQueue=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: ASH COUNTERS since last clear: [2024-07-28 23:53:14] info: zh:ember:uart:ash: Total frames: RX=38, TX=46 [2024-07-28 23:53:14] info: zh:ember:uart:ash: Cancelled : RX=0, TX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: DATA frames : RX=38, TX=8 [2024-07-28 23:53:14] info: zh:ember:uart:ash: DATA bytes : RX=1047, TX=187 [2024-07-28 23:53:14] info: zh:ember:uart:ash: Retry frames: RX=0, TX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: ACK frames : RX=0, TX=38 [2024-07-28 23:53:14] info: zh:ember:uart:ash: NAK frames : RX=0, TX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: nRdy frames : RX=0, TX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: CRC errors : RX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: Comm errors : RX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: Length < minimum: RX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: Length > maximum: RX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: Bad controls : RX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: Bad lengths : RX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: Bad ACK numbers : RX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: Out of buffers : RX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: Retry dupes : RX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: Out of sequence : RX=0 [2024-07-28 23:53:14] info: zh:ember:uart:ash: ACK timeouts : RX=0 Error: at Ezsp.ezspReadAndClearCounters (/app/node_modules/zigbee-herdsman/src/adapter/ember/ezsp/ezsp.ts:2012:19) at Object.func (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/emberAdapter.ts:844:42) at EmberRequestQueue.dispatch (/app/node_modules/zigbee-herdsman/src/adapter/ember/adapter/requestQueue.ts:117:46)

Ricc68 commented 1 month ago

@Nerivec trying to restart, CPU goes to 115% shown on the add-on dashboard, and gives another crash:

[2024-07-29 22:22:48] debug: zh:ember:uart:ash: Waiting for RSTACK... 1200/2500 [2024-07-29 22:22:48] debug: zh:ember:uart:ash: Waiting for RSTACK... 1300/2500 [2024-07-29 22:22:48] debug: zh:ember:uart:ash: Waiting for RSTACK... 1400/2500 [2024-07-29 22:22:48] debug: zh:ember:uart:ash: Waiting for RSTACK... 1500/2500 [2024-07-29 22:22:48] debug: zh:ember:uart:ash: Waiting for RSTACK... 1600/2500 [2024-07-29 22:22:48] debug: zh:ember:uart:ash: Waiting for RSTACK... 1700/2500 [2024-07-29 22:22:49] debug: zh:ember:uart:ash: Waiting for RSTACK... 1800/2500 [2024-07-29 22:22:49] debug: zh:ember:uart:ash: Waiting for RSTACK... 1900/2500 [2024-07-29 22:22:49] debug: zh:ember:uart:ash: Waiting for RSTACK... 2000/2500 [2024-07-29 22:22:49] debug: zh:ember:uart:ash: Waiting for RSTACK... 2100/2500 [2024-07-29 22:22:49] debug: zh:ember:uart:ash: Waiting for RSTACK... 2200/2500 [2024-07-29 22:22:49] debug: zh:ember:uart:ash: Waiting for RSTACK... 2300/2500 [2024-07-29 22:22:49] debug: zh:ember:uart:ash: Waiting for RSTACK... 2400/2500 [2024-07-29 22:22:49] info: zh:ember:uart:ash: ======== ASH NCP reset ======== [2024-07-29 22:22:49] info: zh:ember:uart:ash: ======== ASH starting ======== [2024-07-29 22:22:49] debug: zh:ember:uart:ash: ---> [FRAME type=RST] [2024-07-29 22:22:49] debug: zh:ember:uart:ash: Waiting for RSTACK... 0/2500 [2024-07-29 22:22:49] debug: zh:ember:uart:ash: Waiting for RSTACK... 100/2500 [2024-07-29 22:22:49] debug: zh:ember:uart:ash: Waiting for RSTACK... 200/2500 [2024-07-29 22:22:50] debug: zh:ember:uart:ash: Waiting for RSTACK... 300/2500 [2024-07-29 22:22:50] debug: zh:ember:uart:ash: Waiting for RSTACK... 400/2500 [2024-07-29 22:22:50] debug: zh:ember:uart:ash: Waiting for RSTACK... 500/2500 [2024-07-29 22:22:50] debug: zh:ember:uart:ash: Waiting for RSTACK... 600/2500 [2024-07-29 22:22:50] debug: zh:ember:uart:ash: Waiting for RSTACK... 700/2500 [2024-07-29 22:22:50] debug: zh:ember:uart:ash: Waiting for RSTACK... 800/2500 [2024-07-29 22:22:50] debug: zh:ember:uart:ash: Waiting for RSTACK... 900/2500 [2024-07-29 22:22:50] debug: zh:ember:uart:ash: Waiting for RSTACK... 1000/2500 [2024-07-29 22:22:50] debug: zh:ember:uart:ash: Waiting for RSTACK... 1100/2500 [2024-07-29 22:22:51] debug: zh:ember:uart:ash: Waiting for RSTACK... 1200/2500 [2024-07-29 22:22:51] debug: zh:ember:uart:ash: Waiting for RSTACK... 1300/2500 [2024-07-29 22:22:51] debug: zh:ember:uart:ash: Waiting for RSTACK... 1400/2500 [2024-07-29 22:22:51] debug: zh:ember:uart:ash: Waiting for RSTACK... 1500/2500 [2024-07-29 22:22:51] debug: zh:ember:uart:ash: Waiting for RSTACK... 1600/2500 [2024-07-29 22:22:51] debug: zh:ember:uart:ash: Waiting for RSTACK... 1700/2500 [2024-07-29 22:22:51] debug: zh:ember:uart:ash: Waiting for RSTACK... 1800/2500 [2024-07-29 22:22:51] debug: zh:ember:uart:ash: Waiting for RSTACK... 1900/2500 [2024-07-29 22:22:51] debug: zh:ember:uart:ash: Waiting for RSTACK... 2000/2500 [2024-07-29 22:22:51] debug: zh:ember:uart:ash: Waiting for RSTACK... 2100/2500 [2024-07-29 22:22:52] debug: zh:ember:uart:ash: Waiting for RSTACK... 2200/2500 [2024-07-29 22:22:52] debug: zh:ember:uart:ash: Waiting for RSTACK... 2300/2500 [2024-07-29 22:22:52] debug: zh:ember:uart:ash: Waiting for RSTACK... 2400/2500 [2024-07-29 22:22:52] info: zh:ember:uart:ash: ======== ASH NCP reset ======== [2024-07-29 22:22:52] info: zh:ember:uart:ash: ======== ASH starting ======== [2024-07-29 22:22:52] debug: zh:ember:uart:ash: ---> [FRAME type=RST] [2024-07-29 22:22:52] debug: zh:ember:uart:ash: Waiting for RSTACK... 0/2500 [2024-07-29 22:22:52] debug: zh:ember:uart:ash: Waiting for RSTACK... 100/2500 [2024-07-29 22:22:52] debug: zh:ember:uart:ash: Waiting for RSTACK... 200/2500 [2024-07-29 22:22:52] debug: zh:ember:uart:ash: Waiting for RSTACK... 300/2500 [2024-07-29 22:22:52] debug: zh:ember:uart:ash: Waiting for RSTACK... 400/2500 [2024-07-29 22:22:52] debug: zh:ember:uart:ash: Waiting for RSTACK... 500/2500 [2024-07-29 22:22:53] debug: zh:ember:uart:ash: Waiting for RSTACK... 600/2500 [2024-07-29 22:22:53] debug: zh:ember:uart:ash: Waiting for RSTACK... 700/2500 [2024-07-29 22:22:53] debug: zh:ember:uart:ash: Waiting for RSTACK... 800/2500 [2024-07-29 22:22:53] debug: zh:ember:uart:ash: Waiting for RSTACK... 900/2500 [2024-07-29 22:22:53] debug: zh:ember:uart:ash: Waiting for RSTACK... 1000/2500 [2024-07-29 22:22:53] debug: zh:ember:uart:ash: Waiting for RSTACK... 1100/2500 [2024-07-29 22:22:53] debug: zh:ember:uart:ash: Waiting for RSTACK... 1200/2500 [2024-07-29 22:22:53] debug: zh:ember:uart:ash: Waiting for RSTACK... 1300/2500 [2024-07-29 22:22:53] debug: zh:ember:uart:ash: Waiting for RSTACK... 1400/2500 [2024-07-29 22:22:54] debug: zh:ember:uart:ash: Waiting for RSTACK... 1500/2500 [2024-07-29 22:22:54] debug: zh:ember:uart:ash: Waiting for RSTACK... 1600/2500 [2024-07-29 22:22:54] debug: zh:ember:uart:ash: Waiting for RSTACK... 1700/2500 [2024-07-29 22:22:54] debug: zh:ember:uart:ash: Waiting for RSTACK... 1800/2500 [2024-07-29 22:22:54] debug: zh:ember:uart:ash: Waiting for RSTACK... 1900/2500 [2024-07-29 22:22:54] debug: zh:ember:uart:ash: Waiting for RSTACK... 2000/2500 [2024-07-29 22:22:54] debug: zh:ember:uart:ash: Waiting for RSTACK... 2100/2500 [2024-07-29 22:22:54] debug: zh:ember:uart:ash: Waiting for RSTACK... 2200/2500 [2024-07-29 22:22:54] debug: zh:ember:uart:ash: Waiting for RSTACK... 2300/2500 [2024-07-29 22:22:54] error: zh:ember:uart:ash: ASH disconnected: ASH_ERROR_RESET_FAIL | NCP status: NO_ERROR [2024-07-29 22:22:54] info: zh:ember:uart:ash: ======== ASH NCP reset ======== [2024-07-29 22:22:54] info: zh:ember:uart:ash: ======== ASH starting ======== [2024-07-29 22:22:54] debug: zh:ember:uart:ash: ---> [FRAME type=RST] [2024-07-29 22:22:54] debug: zh:ember:uart:ash: Waiting for RSTACK... 0/2500 [2024-07-29 22:22:55] debug: zh:ember:uart:ash: Waiting for RSTACK... 100/2500 [2024-07-29 22:22:55] debug: zh:ember:uart:ash: Waiting for RSTACK... 200/2500 [2024-07-29 22:22:55] debug: zh:ember:uart:ash: Waiting for RSTACK... 300/2500 [2024-07-29 22:22:55] debug: zh:ember:uart:ash: Waiting for RSTACK... 400/2500 [2024-07-29 22:22:55] debug: zh:ember:uart:ash: Waiting for RSTACK... 500/2500 [2024-07-29 22:22:55] debug: zh:ember:uart:ash: Waiting for RSTACK... 600/2500 [2024-07-29 22:22:55] debug: zh:ember:uart:ash: Waiting for RSTACK... 700/2500 [2024-07-29 22:22:55] debug: zh:ember:uart:ash: Waiting for RSTACK... 800/2500 [2024-07-29 22:22:55] debug: zh:ember:uart:ash: Waiting for RSTACK... 900/2500 [2024-07-29 22:22:55] debug: zh:ember:uart:ash: Waiting for RSTACK... 1000/2500 [2024-07-29 22:22:56] debug: zh:ember:uart:ash: Waiting for RSTACK... 1100/2500 [2024-07-29 22:22:56] debug: zh:ember:uart:ash: Waiting for RSTACK... 1200/2500 [2024-07-29 22:22:56] debug: zh:ember:uart:ash: Waiting for RSTACK... 1300/2500 [2024-07-29 22:22:56] debug: zh:ember:uart:ash: Waiting for RSTACK... 1400/2500 [2024-07-29 22:22:56] debug: zh:ember:uart:ash: Waiting for RSTACK... 1500/2500 [2024-07-29 22:22:56] debug: zh:ember:uart:ash: Waiting for RSTACK... 1600/2500 [2024-07-29 22:22:56] debug: zh:ember:uart:ash: Waiting for RSTACK... 1700/2500 [2024-07-29 22:22:56] debug: zh:ember:uart:ash: Waiting for RSTACK... 1800/2500 [2024-07-29 22:22:56] debug: zh:ember:uart:ash: Waiting for RSTACK... 1900/2500 [2024-07-29 22:22:56] debug: zh:ember:uart:ash: Waiting for RSTACK... 2000/2500 [2024-07-29 22:22:57] debug: zh:ember:uart:ash: Waiting for RSTACK... 2100/2500 [2024-07-29 22:22:57] debug: zh:ember:uart:ash: Waiting for RSTACK... 2200/2500 [2024-07-29 22:22:57] debug: zh:ember:uart:ash: Waiting for RSTACK... 2300/2500 [2024-07-29 22:22:57] debug: zh:ember:uart:ash: Waiting for RSTACK... 2400/2500 [2024-07-29 22:22:57] error: z2m: Error while starting zigbee-herdsman [2024-07-29 22:22:57] error: z2m: Failed to start zigbee [2024-07-29 22:22:57] error: z2m: Check https://www.zigbee2mqtt.io/guide/installation/20_zigbee2mqtt-fails-to-start.html for possible solutions [2024-07-29 22:22:57] error: z2m: Exiting...

Nerivec commented 1 month ago

Do you have the rest of the logs (from this post)?

Bolukan commented 1 month ago

I have problems since 1.39.0: my guess: I can send commands to devices (turn light on/off) but do not receive messages back (no online/offline, no interviews, etc.). Did reboots and reinstallations but no success. I just reverted back to 1.38.0 and see what happens. First signs are ok: communication is restored.

Ricc68 commented 1 month ago

@Nerivec I am attaching the full log.

log.txt.gz

Nerivec commented 1 month ago

Something's not quite right here.

[2024-07-28 22:59:56]
[2024-07-28 23:53:00]

Zero logging in-between?

Ricc68 commented 1 month ago

@Nerivec that was grepped -i -v mqtt to remove useless mqtt info. Now this new attach is complete, not grepped, and yes, the only activity in between is the backup of the state, it seems the z2m communication with the zbdongle-e is stalled. There's definitely something not quite right there. Consider that a soft reboot of the VM restores the communication, no power down of the dongle, so the dongle did not lose the ability to communicate.

log.log.gz

Nerivec commented 1 month ago

Rebooting Z2M reboots the dongle (they sync-up during startup). But it seems everything just cuts off, not just the communication to the adapter, Z2M sending, and receiving alike, since there is no logging. Only a couple of time-based triggers remain.

Did you check all available logs (host/vm) around 2024-07-28 23:00:00 for some event that could possibly precipitate this behavior?

Ricc68 commented 1 month ago

Rebooting Z2M reboots the dongle (they sync-up during startup). But it seems everything just cuts off, not just the communication to the adapter, Z2M sending, and receiving alike, since there is no logging. Only a couple of time-based triggers remain.

Did you check all available logs (host/vm) around 2024-07-28 23:00:00 for some event that could possibly precipitate this behavior?

Checked the logs but found nothing that can possibly be related. The fact that it is happening much less frequently in debug log than in info log may be a hint?

Ricc68 commented 1 month ago

@Nerivec happened also with 1.39.1. log.zip Log attached.

Koenkk commented 1 month ago

@Ricc68

Ricc68 commented 1 month ago

@Ricc68

* when the crash happens, do you see anything happening in `sudo dmesg`?

* Could you check if the crash also happens when running on different hardware? (maybe you have some Pi lying around?)

@Koenkk I have checked the dmesg and there's nothing at the time of the crash.

Unfortunately I don't have other hardware on which run haos.

Koenkk commented 4 weeks ago

@Ricc68 there is no need to run haos, just z2m on bare metal would be enough already (docs)

Ricc68 commented 4 weeks ago

@Ricc68

* when the crash happens, do you see anything happening in `sudo dmesg`?

* Could you check if the crash also happens when running on different hardware? (maybe you have some Pi lying around?)

@Koenkk I have checked the dmesg and there's nothing at the time of the crash.

Unfortunately I don't have other hardware on which run haos.

@Ricc68 there is no need to run haos, just z2m on bare metal would be enough already (docs)

@Koenkk the problem is that I don't have any available hardware to run it. Just some spare esp8266 which is useless for the task.

Koenkk commented 4 weeks ago

You can also run it on a laptop (even on Windows)