Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge πŸŒ‰, get rid of your proprietary Zigbee bridges πŸ”¨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
11.75k stars 1.64k forks source link

Z2MQTT Add-on stops when losing connection to coordinator - error Failed to stop Zigbee2MQTT #16140

Open mihsu81 opened 1 year ago

mihsu81 commented 1 year ago

What happened?

The Z2MQTT Add-on stops when losing connection to the coordinator. The HA server and LilyZig coordinator are connected to the same router. After I reboot the router (takes about 30-40 seconds), within ~20 seconds the Z2MQTT Add-on stops with the below error:

Zigbee2MQTT:error 2023-01-13 11:12:21: Adapter disconnected, stopping
Zigbee2MQTT:debug 2023-01-13 11:12:21: Saving state to file /config/zigbee2mqtt/state.json
Zigbee2MQTT:info  2023-01-13 11:12:21: MQTT publish: topic 'zigbee2mqtt/bridge/state', payload 'offline'
Zigbee2MQTT:info  2023-01-13 11:12:21: Disconnecting from MQTT server
Zigbee2MQTT:info  2023-01-13 11:12:21: Stopping zigbee-herdsman...
Zigbee2MQTT:error 2023-01-13 11:12:21: Failed to stop Zigbee2MQTT

What did you expect to happen?

The Z2MQTT add-on retries connecting to the coordinator a configurable number of times.

How to reproduce it (minimal and precise)

No response

Zigbee2MQTT version

1.29.1-1

Adapter firmware version

20220219

Adapter

ZigStar LilyZig POE

Debug log

log.txt

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

mihsu81 commented 1 year ago

The issue is still present.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

mihsu81 commented 1 year ago

The issue is still present.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

mihsu81 commented 1 year ago

The issue is still present.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

mihsu81 commented 1 year ago

The issue is still present.

midlan commented 1 year ago

I discovered similar issue.

After I disconnected SLZB-06 from ethernet cable, the zigbee2mqtt service crashed (understandably). Then I connected again my SLZB-06 gateway, but the zigbee2mqtt service stayed down. I waited at least 10 minutes, but the watchdog seems not working, seems it does not try to start the service.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

mihsu81 commented 1 year ago

The issue is still present in the Z2MQTT 1.31.2-1 add-on and HA 2023.6.3.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

mihsu81 commented 1 year ago

The issue is still present in the Z2MQTT 1.32.1-1 add-on and HA 2023.7.3.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

mihsu81 commented 1 year ago

The issue is still present in the Z2MQTT 1.32.2-1 add-on and HA 2023.8.4.

github-actions[bot] commented 11 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

mihsu81 commented 11 months ago

The issue is still present.

github-actions[bot] commented 10 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

mihsu81 commented 10 months ago

The issue is still present.

lhorak commented 7 months ago

I have most likely the same or similar issue. I have an ethernet coordinator (TCP). When I, for example, restart my router, zigbee2mqtt crashes and doesn't recover even though the coordinator goes back online. The only way to get things back up and running is by manually restarting the zigbee2mqtt service (mine running barebones in VM).

I wonder if this is expected behavior or a bug?

Some logs:

error 2024-01-16 21:59:38: Adapter disconnected, stopping
debug 2024-01-16 21:59:38: Saving state to file /var/lib/zigbee2mqtt/state.json
info  2024-01-16 21:59:38: MQTT publish: topic 'zigbee2mqtt/bridge/state', payload '{"state":"offline"}'
info  2024-01-16 21:59:38: Disconnecting from MQTT server
info  2024-01-16 21:59:38: Stopping zigbee-herdsman...
error 2024-01-16 21:59:38: Failed to stop Zigbee2MQTT
mihsu81 commented 7 months ago

I set up an automation which starts the Add-On if it's not running for 1 minute and 10 seconds. Nonetheless, I hope this bug will be fixed eventually.

alias: Start Zigbee2MQTT Add-On if stopped
description: ""
trigger:
  - type: not_running
    platform: device
    device_id: 2cd20c96528ec97880b06007e39d7c06
    entity_id: binary_sensor.zigbee2mqtt_running
    domain: binary_sensor
    for:
      hours: 0
      minutes: 1
      seconds: 10
condition: []
action:
  - service: hassio.addon_start
    data:
      addon: 45df7312_zigbee2mqtt
mode: single
3vilson commented 7 months ago

Got it woking on Proxmox Alpine LXC.

Create the directory:

mkdir /opt/zigbee2mqtt

Create a script to check if the Zigbee2MQTT service crashed then execute a reboot.

cat <<EOF >/opt/zigbee2mqtt/check_zigbee_service.sh
if rc-service zigbee2mqtt status | grep -q "started"; then
    echo "zigbee2mqtt is running."
else
    echo "zigbee2mqtt is not running. Restarting..."
    #rc-service zigbee2mqtt restart
    reboot
fi
EOF

Give it execute permissions:

chmod +x /opt/zigbee2mqtt/check_zigbee_service.sh

Open your crontab file for editing:

nano /etc/crontabs/root

Add a line to schedule the script to run at your desired interval. For example, to run every 5 minutes:

*/5     *   *   *   *   /opt/zigbee2mqtt/check_zigbee_service.sh

reboot

lhorak commented 7 months ago

Just to give an update here, I was running z2m on Proxmox Alpine LXC, I checked the build version of the Alpine package: https://pkgs.alpinelinux.org/package/edge/community/aarch64/zigbee2mqtt and found out the latest version in the repository is 1.34.0, while z2m is on 1.35.1 currently.

I prefer to have things updated, so I've migrated to LXC with z2m running under docker, and I just tested and when unplugging the IP coordinator and plugging back in, z2m successfully starts up on it's own, so this has solved it for me (and as an added bonus I get instant updates instead of waiting for the Alpine package to get updated πŸ™‚ )

I know this does not solve the issue and brings a little overhead with running Docker instead of barebones, but I just wanted to add this here as another option that is proven to work.

mrbrdo commented 6 months ago

I have the same issue. After unplugging my ethernet coordinator, the z2m addon crashes and does not come back up after reconnecting. This is a bug with z2m, because the watchdog in HA supervisor is not designed to handle such cases, the addon itself must handle it. See https://github.com/home-assistant/supervisor/pull/3779 when this behavior was updated. z2m should be updated to handle connection retrying (indefinitely) instead of crashing. If this behavior is problematic for another use case, then it can be a configurable setting.

The issue is still present with HA 2024.2.5, supervisor 2024.02.1, HAOS 12.0 and Zigbee2MQTT 1.36.0-1.

@3vilson @lhorak in my opinion what you wrote is not really relevant to this issue. The issue does not pertain to running on Proxmox and therefore this cannot be a solution. You have a different setup. It's like suggesting using ZHA instead of Z2M is a solution.

psarossy commented 4 months ago

Sill an issue on HA 2024.4.3, supervisor 2024.04.0, HAOS 12.2 Zigbee2MQTT 1.36.1-1

r01k commented 4 months ago

How come it completely stops/crashes when it can't connect to the coordinator (which is very likely to occur specially when using network-based coordinators), instead of just checking every few minutes for coordinator availability?

dinhchinh82 commented 3 months ago

I got exactly issue with the latest version z2m 1.38.0. When I unplug the ethernet coordinator, the z2m got crashed and unable to work again even the ethernet coordinator is plugged again with the same IP address. It seems to be hang forever until I restart the z2m manually again.

Here is the latest log from z2m:

[2024-06-02 17:13:21] info: zh:ember:uart:ash: ======== ASH stopped ======== [2024-06-02 17:13:21] error: zh:ember:uart:ash: Failed to init port with error Error: connect ECONNREFUSED 192.168.86.27:8888 [2024-06-02 17:13:21] error: zh:ember: Failed to reset and init NCP. Error: Failed to start EZSP layer with status=HOST_FATAL_ERROR. [2024-06-02 17:13:21] info: zh:ember:uart:ash: ASH COUNTERS since last clear: [2024-06-02 17:13:21] info: zh:ember:uart:ash: Total frames: RX=0, TX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: Cancelled : RX=0, TX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: DATA frames : RX=0, TX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: DATA bytes : RX=0, TX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: Retry frames: RX=0, TX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: ACK frames : RX=0, TX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: NAK frames : RX=0, TX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: nRdy frames : RX=0, TX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: CRC errors : RX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: Comm errors : RX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: Length < minimum: RX=0 [2024-06-02 17:13:21] info: zh:ember:uart:ash: Length > maximum: RX=0 [2024-06-02 17:13:22] info: zh:ember:uart:ash: Bad controls : RX=0 [2024-06-02 17:13:22] info: zh:ember:uart:ash: Bad lengths : RX=0 [2024-06-02 17:13:22] info: zh:ember:uart:ash: Bad ACK numbers : RX=0 [2024-06-02 17:13:22] info: zh:ember:uart:ash: Out of buffers : RX=0 [2024-06-02 17:13:22] info: zh:ember:uart:ash: Retry dupes : RX=0 [2024-06-02 17:13:22] info: zh:ember:uart:ash: Out of sequence : RX=0 [2024-06-02 17:13:22] info: zh:ember:uart:ash: ACK timeouts : RX=0 [2024-06-02 17:13:22] info: zh:ember:uart:ash: ======== ASH stopped ======== [2024-06-02 17:13:22] info: zh:ember:ezsp: ======== EZSP stopped ======== [2024-06-02 17:13:22] info: zh:ember: ======== Ember Adapter Stopped ======== [2024-06-02 17:13:22] error: z2m: Adapter disconnected, stopping [2024-06-02 17:13:22] info: z2m: Disconnecting from MQTT server [2024-06-02 17:13:22] info: z2m: Stopping zigbee-herdsman... [2024-06-02 17:46:53] info: z2m: Disconnecting from MQTT server [2024-06-02 17:46:53] info: z2m: Stopping zigbee-herdsman...

Bjk8kds commented 4 days ago

Happened to me too, when i unplugged the coordinator a few seconds its still working properly, but when its unplugged for a longer time, the Z2M just not automatically running, i had to start it manually