Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge 🌉, get rid of your proprietary Zigbee bridges 🔨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
12.11k stars 1.68k forks source link

zigbee2mqtt does not start after being logged off for a while #23148

Open ar1nd4m opened 4 months ago

ar1nd4m commented 4 months ago

What happened?

My SLZB-06 was offline for a few days, when it got rebooted the zigbee2mqtt docker container managed by HA (addon) did not restart and get it online.

What did you expect to happen?

I expected that if the device is online, HA will automatically restart the docker container.

I think the right way to fix this is to make zigbee2mqtt not crash but wait - or change HA so that it does not stop retrying after a significant number of retries.

How to reproduce it (minimal and precise)

Have a SLZB-06 connected over LAN to zigbee2mqtt (running as an addon in HA).

Disconnect the power from SLZB-06 and let zigbee2mqtt fail it will keep trying for a few times and then stop. Give it about 4 hours.

After this, connect the SLZB-06 and the system does not recover back.

Zigbee2MQTT version

1.38.0 commit: unknown

Adapter firmware version

20210708

Adapter

zstack

Setup

Add-on HA

Debug log

    at Logger.info (/app/lib/util/logger.ts:177:14)
    at Znp.onPortClose (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/znp/znp.ts:113:16)
    at Object.onceWrapper (node:events:632:26)
    at Socket.emit (node:events:517:28)
    at TCP.<anonymous> (node:net:350:12)
[17:12:51] INFO: Preparing to start...
[17:12:51] INFO: Socat not enabled
[17:12:53] INFO: Starting Zigbee2MQTT...
[2024-06-19 17:12:58] info:     z2m: Logging to console, file (filename: log.log)
[2024-06-19 17:12:58] info:     z2m: Starting Zigbee2MQTT version 1.38.0 (commit #unknown)
[2024-06-19 17:12:58] info:     z2m: Starting zigbee-herdsman (0.49.2)
[2024-06-19 17:12:59] info:     zh:zstack:znp: Opening TCP socket with 192.168.1.48:6638
[2024-06-19 17:13:02] info:     zh:zstack:znp: Socket error
[2024-06-19 17:13:02] error:    z2m: Error while starting zigbee-herdsman
[2024-06-19 17:13:02] error:    z2m: Failed to start zigbee
[2024-06-19 17:13:02] error:    z2m: Check https://www.zigbee2mqtt.io/guide/installation/20_zigbee2mqtt-fails-to-start.html for possible solutions
[2024-06-19 17:13:02] error:    z2m: Exiting...
[2024-06-19 17:13:02] error:    z2m: Error: Error while opening socket
    at Socket.<anonymous> (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/znp/znp.ts:191:24)
    at Socket.emit (node:events:529:35)
    at emitErrorNT (node:internal/streams/destroy:151:8)
    at emitErrorCloseNT (node:internal/streams/destroy:116:3)
    at processTicksAndRejections (node:internal/process/task_queues:82:21)

/app/node_modules/winston/node_modules/readable-stream/lib/_stream_writable.js:264
  var er = new ERR_STREAM_WRITE_AFTER_END();
           ^
Error: write after end
    at writeAfterEnd (/app/node_modules/winston/node_modules/readable-stream/lib/_stream_writable.js:264:12)
    at DerivedLogger.Writable.write (/app/node_modules/winston/node_modules/readable-stream/lib/_stream_writable.js:300:21)
    at DerivedLogger.log (/app/node_modules/winston/lib/winston/logger.js:252:14)
    at Logger.log (/app/lib/util/logger.ts:164:25)
    at Logger.info (/app/lib/util/logger.ts:177:14)
    at Znp.onPortClose (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/znp/znp.ts:113:16)
    at Object.onceWrapper (node:events:632:26)
    at Socket.emit (node:events:517:28)
    at TCP.<anonymous> (node:net:350:12)
[17:13:04] INFO: Preparing to start...
[17:13:05] INFO: Socat not enabled
[17:13:06] INFO: Starting Zigbee2MQTT...
[2024-06-19 17:13:11] info:     z2m: Logging to console, file (filename: log.log)
[2024-06-19 17:13:12] info:     z2m: Starting Zigbee2MQTT version 1.38.0 (commit #unknown)
[2024-06-19 17:13:12] info:     z2m: Starting zigbee-herdsman (0.49.2)
[2024-06-19 17:13:12] info:     zh:zstack:znp: Opening TCP socket with 192.168.1.48:6638
[2024-06-19 17:13:15] info:     zh:zstack:znp: Socket error
[2024-06-19 17:13:15] error:    z2m: Error while starting zigbee-herdsman
[2024-06-19 17:13:15] error:    z2m: Failed to start zigbee
[2024-06-19 17:13:15] error:    z2m: Check https://www.zigbee2mqtt.io/guide/installation/20_zigbee2mqtt-fails-to-start.html for possible solutions
[2024-06-19 17:13:15] error:    z2m: Exiting...
[2024-06-19 17:13:15] error:    z2m: Error: Error while opening socket
    at Socket.<anonymous> (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/znp/znp.ts:191:24)
    at Socket.emit (node:events:529:35)
    at emitErrorNT (node:internal/streams/destroy:151:8)
    at emitErrorCloseNT (node:internal/streams/destroy:116:3)
    at processTicksAndRejections (node:internal/process/task_queues:82:21)

/app/node_modules/winston/node_modules/readable-stream/lib/_stream_writable.js:264
  var er = new ERR_STREAM_WRITE_AFTER_END();
           ^
Error: write after end
    at writeAfterEnd (/app/node_modules/winston/node_modules/readable-stream/lib/_stream_writable.js:264:12)
    at DerivedLogger.Writable.write (/app/node_modules/winston/node_modules/readable-stream/lib/_stream_writable.js:300:21)
    at DerivedLogger.log (/app/node_modules/winston/lib/winston/logger.js:252:14)
    at Logger.log (/app/lib/util/logger.ts:164:25)
    at Logger.info (/app/lib/util/logger.ts:177:14)
    at Znp.onPortClose (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/znp/znp.ts:113:16)
    at Object.onceWrapper (node:events:632:26)
    at Socket.emit (node:events:517:28)
    at TCP.<anonymous> (node:net:350:12)
[17:13:18] INFO: Preparing to start...
[17:13:18] INFO: Socat not enabled
[17:13:20] INFO: Starting Zigbee2MQTT...
[2024-06-19 17:13:25] info:     z2m: Logging to console, file (filename: log.log)
[2024-06-19 17:13:25] info:     z2m: Starting Zigbee2MQTT version 1.38.0 (commit #unknown)
[2024-06-19 17:13:25] info:     z2m: Starting zigbee-herdsman (0.49.2)
[2024-06-19 17:13:25] info:     zh:zstack:znp: Opening TCP socket with 192.168.1.48:6638
[2024-06-19 17:13:28] info:     zh:zstack:znp: Socket error
[2024-06-19 17:13:28] error:    z2m: Error while starting zigbee-herdsman
[2024-06-19 17:13:28] error:    z2m: Failed to start zigbee
[2024-06-19 17:13:28] error:    z2m: Check https://www.zigbee2mqtt.io/guide/installation/20_zigbee2mqtt-fails-to-start.html for possible solutions
[2024-06-19 17:13:28] error:    z2m: Exiting...
[2024-06-19 17:13:28] error:    z2m: Error: Error while opening socket
    at Socket.<anonymous> (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/znp/znp.ts:191:24)
    at Socket.emit (node:events:529:35)
    at emitErrorNT (node:internal/streams/destroy:151:8)
    at emitErrorCloseNT (node:internal/streams/destroy:116:3)
    at processTicksAndRejections (node:internal/process/task_queues:82:21)

/app/node_modules/winston/node_modules/readable-stream/lib/_stream_writable.js:264
  var er = new ERR_STREAM_WRITE_AFTER_END();
           ^
Error: write after end
    at writeAfterEnd (/app/node_modules/winston/node_modules/readable-stream/lib/_stream_writable.js:264:12)
    at DerivedLogger.Writable.write (/app/node_modules/winston/node_modules/readable-stream/lib/_stream_writable.js:300:21)
    at DerivedLogger.log (/app/node_modules/winston/lib/winston/logger.js:252:14)
    at Logger.log (/app/lib/util/logger.ts:164:25)
    at Logger.info (/app/lib/util/logger.ts:177:14)
    at Znp.onPortClose (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/znp/znp.ts:113:16)
    at Object.onceWrapper (node:events:632:26)
    at Socket.emit (node:events:517:28)
    at TCP.<anonymous> (node:net:350:12)
Koenkk commented 4 months ago

It looks like the adapter is not accepting any connections: Error: Error while opening socket, don't think this is something that can be fixed from z2m.

ar1nd4m commented 4 months ago

Thanks for looking into the issue - I think the problem is created by the socket (as it was indeed offline for a few hours) - but it is exacerbated by the issue that zigbee2mqtt does not retry.

If you see, the last log on that entry was on 2024-06-19 17:13:28 ... however it was the latest logs I pulled up on 2024-06-23 - So atleast 4 days passed without zigbee2mqtt reloading.

I think either the HA add-on retry logic is busted or something needs to happen in zigbee2mqtt to make sure HA doesn't mark it as a permanent failure.

I looked a little bit into https://developers.home-assistant.io/docs/add-ons/configuration/ but I don't see a retry strategy listed there.

Would it be possible to not kill the docker job if the socket is not available - but instead catch that exception and retry after a fixed timeout ? That way the add-on keeps running and is self recoverable ?

Koenkk commented 4 months ago

@Nerivec recently introduced the watchdog feature in https://github.com/Koenkk/zigbee2mqtt/pull/23043, I've now added support for this for the HA addon. For this you will have to use the edge version of the addon. Docs:

ar1nd4m commented 4 months ago

Thanks - Let me try it right away! Should I close this issue and re-open another one if it doesn't work as expected (to keep your issue-list clean) ?

Koenkk commented 4 months ago

Let's keep it here