Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge πŸŒ‰, get rid of your proprietary Zigbee bridges πŸ”¨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
12.04k stars 1.67k forks source link

Z2M aborts on socket disconnection (UZG-01) #23236

Open habitats-tech opened 3 months ago

habitats-tech commented 3 months ago

What happened?

We need to put some effort to improve Z2M error handling. I say we as I put myself on the frontline to assist in whatever capacity I can. I am a strong believer of using PoE Ethernet coordinators, so my focus is on that front.

The following are findings based on the following environment:

  1. UZG-01 always running on Ethernet with latest stable XZG firmware (at present 20240612/ESP32/MCU, 20240316/CC2652P7/Zigbee)
  2. Mosquitto under Debian 12 running in LXC under Proxmox (Linux - not Docker - install)
  3. Z2M under Debian 12 running in LXC under Proxmox (Linux - not Docker - install), always running latest stable version

High performance hardware used throughout: CPU, RAM, disk, network

The biggest issue is how Z2M handles recovery from XZG socket disconnection, at present no recovery, which I believe should be fixed as a priority.

What did you expect to happen?

Z2M exiting running state on socket disconnection. I believe Z2M should try to recover indefinitely, or through a setting we are in control of.

How to reproduce it (minimal and precise)

I run Z2M using npm start with debug log. I monitor real time: log to console and to a file. The Zigbee network has around 15 devices connected all within a radius of 2m from the coordinator, of which ~33% are routers.

The fastest way to reproduce is to reboot the UZG-01 or just disconnect from Ethernet for a seconds.

image

image

Zigbee2MQTT version

1.38.0 and 1.39.0

Adapter firmware version

20240316

Adapter

UZG-01 on Ethernet with latest XZG FW

Setup

x86-64 Debian 12 based system container (LXC) under Proxmox 8.x (no Docker)

Debug log

[2024-07-02 13:53:19] debug: zh:zstack:unpi:parser: <-- [254,30,68,129,0,0,0,3,0,122,1,1,0,102,0,246,137,218,0,0,10,8,155,10,11,240,72,3,0,3,0,0,122,29,38] [2024-07-02 13:53:19] debug: zh:zstack:unpi:parser: --- parseNext [254,30,68,129,0,0,0,3,0,122,1,1,0,102,0,246,137,218,0,0,10,8,155,10,11,240,72,3,0,3,0,0,122,29,38] [2024-07-02 13:53:19] debug: zh:zstack:unpi:parser: --> parsed 30 - 2 - 4 - 129 - [0,0,0,3,0,122,1,1,0,102,0,246,137,218,0,0,10,8,155,10,11,240,72,3,0,3,0,0,122,29] - 38 [2024-07-02 13:53:19] debug: zh:zstack:znp: AREQ: <-- AF - incomingMsg - {"groupid":0,"clusterid":768,"srcaddr":31232,"srcendpoint":1,"dstendpoint":1,"wasbroadcast":0,"linkquality":102,"securityuse":0,"timestamp":14322166,"transseqnumber":0,"len":10,"data":{"type":"Buffer","data":[8,155,10,11,240,72,3,0,3,0]}} [2024-07-02 13:53:19] debug: zh:controller: Failed to parse frame: Error: Read for '3' not available [2024-07-02 13:53:19] debug: zh:controller: Received payload: clusterID=768, address=31232, groupID=0, endpoint=1, destinationEndpoint=1, wasBroadcast=false, linkQuality=102, frame=undefined [2024-07-02 13:53:19] debug: zh:zstack:unpi:parser: --- parseNext [] [2024-07-02 13:53:19] debug: z2m: Received Zigbee message from 'ZB SGZB12W300Z-A0 DL24', type 'raw', cluster 'lightingColorCtrl', data '{"data":[8,155,10,11,240,72,3,0,3,0],"type":"Buffer"}' from endpoint 1 with groupID 0 [2024-07-02 13:53:19] debug: z2m: No converter available for 'TS0502B' with cluster 'lightingColorCtrl' and type 'raw' and data '{"data":[8,155,10,11,240,72,3,0,3,0],"type":"Buffer"}' [2024-07-02 13:53:19] debug: zh:zstack:unpi:parser: <-- [254,30,68,129,0,0,0,3,255,87,1,1,0,142,0,117,188,218,0,0,10,8,46,10,10,240,72,3,0,3,3,255,87,29,207] [2024-07-02 13:53:19] debug: zh:zstack:unpi:parser: --- parseNext [254,30,68,129,0,0,0,3,255,87,1,1,0,142,0,117,188,218,0,0,10,8,46,10,10,240,72,3,0,3,3,255,87,29,207] [2024-07-02 13:53:19] debug: zh:zstack:unpi:parser: --> parsed 30 - 2 - 4 - 129 - [0,0,0,3,255,87,1,1,0,142,0,117,188,218,0,0,10,8,46,10,10,240,72,3,0,3,3,255,87,29] - 207 [2024-07-02 13:53:19] debug: zh:zstack:znp: AREQ: <-- AF - incomingMsg - {"groupid":0,"clusterid":768,"srcaddr":22527,"srcendpoint":1,"dstendpoint":1,"wasbroadcast":0,"linkquality":142,"securityuse":0,"timestamp":14335093,"transseqnumber":0,"len":10,"data":{"type":"Buffer","data":[8,46,10,10,240,72,3,0,3,3]}} [2024-07-02 13:53:19] debug: zh:controller: Failed to parse frame: Error: Read for '3' not available [2024-07-02 13:53:19] debug: zh:controller: Received payload: clusterID=768, address=22527, groupID=0, endpoint=1, destinationEndpoint=1, wasBroadcast=false, linkQuality=142, frame=undefined [2024-07-02 13:53:19] debug: zh:zstack:unpi:parser: --- parseNext [] [2024-07-02 13:53:19] debug: z2m: Received Zigbee message from 'ZB SGZB12W300Z-A0 DL15', type 'raw', cluster 'lightingColorCtrl', data '{"data":[8,46,10,10,240,72,3,0,3,3],"type":"Buffer"}' from endpoint 1 with groupID 0 [2024-07-02 13:53:19] debug: z2m: No converter available for 'TS0502B' with cluster 'lightingColorCtrl' and type 'raw' and data '{"data":[8,46,10,10,240,72,3,0,3,3],"type":"Buffer"}' [2024-07-02 13:53:34] info: zh:zstack:znp: Socket error [2024-07-02 13:53:34] info: zh:zstack:znp: Port closed [2024-07-02 13:53:34] debug: zh:controller: Adapter disconnected [2024-07-02 13:53:34] info: zh:zstack:znp: closing [2024-07-02 13:53:34] error: z2m: Adapter disconnected, stopping [2024-07-02 13:53:34] debug: z2m: Saving state to file /opt/zigbee2mqtt/data/state.json [2024-07-02 13:53:34] info: z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/bridge/state', payload 'offline' [2024-07-02 13:53:34] info: z2m: Disconnecting from MQTT server [2024-07-02 13:53:34] info: z2m: Stopping zigbee-herdsman... [2024-07-02 13:53:34] debug: zh:controller:database: Writing database to '/opt/zigbee2mqtt/data/database.db' [2024-07-02 13:53:34] info: z2m: Stopped zigbee-herdsman [2024-07-02 13:53:34] info: z2m: Stopped Zigbee2MQTT

Koenkk commented 3 months ago

Z2M 1.39.0 introduces the watchdog which can be used for this.

Dinth commented 3 months ago

@habitats-tech if you're going to test the watchdog, could you report back how usable Z2M is? Im already reverting back to 1.38.0, but after an update to 1.39.0 my Z2M and UZG-1 both needed restart every 20 minutes or so.

habitats-tech commented 3 months ago

For anyone who might attempt to test this: the system will run even if you have typos in the command line. Ensure there are no typos in the command line.

Z2M_WATCHDOG=0.5,0.5,0.5,1,1,1 npm start -> exit with error after 1st attempt Z2M_WATCHDOG=1,1,1,1,1,1 npm start -> exit with error after 1st attempt Z2M_WATCHDOG=1,2,3 npm start -> exit with error after 1st attempt

Failed after 1st attempt. I have seen the error in red in other instances.

[2024-07-04 05:46:44] info: z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/ZB SGZB12W300Z-A0 DL24', payload '{"brightness":255,"color":{"h":null,"hue":null},"color_mode":"xy","color_temp":153,"do_not_disturb":null,"last_seen":"2024-07-04T05:46:44+04:00","linkquality":123,"state":"OFF"}' [2024-07-04 05:47:09] info: zh:zstack:znp: Socket error [2024-07-04 05:47:09] info: zh:zstack:znp: Port closed [2024-07-04 05:47:09] debug: zh:controller: Adapter disconnected [2024-07-04 05:47:09] info: zh:zstack:znp: closing [2024-07-04 05:47:09] error: z2m: Adapter disconnected, stopping [2024-07-04 05:47:09] debug: z2m: Saving state to file /opt/zigbee2mqtt/data/state.json [2024-07-04 05:47:09] info: z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/bridge/state', payload 'offline' [2024-07-04 05:47:09] info: z2m: Disconnecting from MQTT server [2024-07-04 05:47:09] info: z2m: Stopping zigbee-herdsman... [2024-07-04 05:47:09] debug: zh:controller:database: Writing database to '/opt/zigbee2mqtt/data/database.db' [2024-07-04 05:47:09] info: z2m: Stopped zigbee-herdsman [2024-07-04 05:47:09] info: z2m: Stopped Zigbee2MQTT WATCHDOG: Waiting 0.5min before next start try. Starting Zigbee2MQTT with watchdog (30000,30000,30000,60000,60000,60000). [2024-07-04 05:47:39] info: z2m: Logging to console, file (filename: log.log) [2024-07-04 05:47:39] debug: z2m: Loaded state from file /opt/zigbee2mqtt/data/state.json [2024-07-04 05:47:39] info: z2m: Starting Zigbee2MQTT version 1.39.0 (commit #0326926) [2024-07-04 05:47:39] info: z2m: Starting zigbee-herdsman (0.50.1) [2024-07-04 05:47:39] debug: z2m: Using zigbee-herdsman with settings: '"{\"network\":{\"panID\":5943,\"extendedPanID\":[88,178,103,219,169,189,77,71],\"channelList\":[25],\"networkKey\":\"HIDDEN\"},\"databasePath\":\"/opt/zigbee2mqtt/data/database.db\",\"databaseBackupPath\":\"/opt/zigbee2mqtt/data/database.db.backup\",\"backupPath\":\"/opt/zigbee2mqtt/data/coordinator_backup.json\",\"serialPort\":{\"baudRate\":115200,\"path\":\"tcp://192.168.0.228:6638\"},\"adapter\":{\"concurrent\":null,\"delay\":null,\"disableLED\":false}}"' [2024-07-04 05:47:40] debug: zh:controller: Starting with options '{"network":{"networkKeyDistribute":false,"networkKey":"HIDDEN","panID":5943,"extendedPanID":[88,178,103,219,169,189,77,71],"channelList":[25]},"serialPort":{"baudRate":115200,"path":"tcp://192.168.0.228:6638"},"databasePath":"/opt/zigbee2mqtt/data/database.db","databaseBackupPath":"/opt/zigbee2mqtt/data/database.db.backup","backupPath":"/opt/zigbee2mqtt/data/coordinator_backup.json","adapter":{"disableLED":false,"concurrent":null,"delay":null}}' [2024-07-04 05:47:40] info: zh:zstack:znp: Opening TCP socket with 192.168.0.228:6638 [2024-07-04 05:47:43] info: zh:zstack:znp: Socket error [2024-07-04 05:47:43] error: z2m: Error while starting zigbee-herdsman [2024-07-04 05:47:43] error: z2m: Failed to start zigbee [2024-07-04 05:47:43] error: z2m: Check https://www.zigbee2mqtt.io/guide/installation/20_zigbee2mqtt-fails-to-start.html for possible solutions [2024-07-04 05:47:43] error: z2m: Exiting... [2024-07-04 05:47:43] error: z2m: Error: Error while opening socket at Socket. (/opt/zigbee2mqtt/node_modules/zigbee-herdsman/src/adapter/z-stack/znp/znp.ts:191:24) at Socket.emit (node:events:531:35) at emitErrorNT (node:internal/streams/destroy:169:8) at emitErrorCloseNT (node:internal/streams/destroy:128:3) at processTicksAndRejections (node:internal/process/task_queues:82:21)

/opt/zigbee2mqtt/node_modules/winston/node_modules/readable-stream/lib/_stream_writable.js:264 var er = new ERR_STREAM_WRITE_AFTER_END(); ^ Error: write after end at writeAfterEnd (/opt/zigbee2mqtt/node_modules/winston/node_modules/readable-stream/lib/_stream_writable.js:264:12) at DerivedLogger.Writable.write (/opt/zigbee2mqtt/node_modules/winston/node_modules/readable-stream/lib/_stream_writable.js:300:21) at DerivedLogger.log (/opt/zigbee2mqtt/node_modules/winston/lib/winston/logger.js:231:12) at Logger.log (/opt/zigbee2mqtt/lib/util/logger.ts:188:25) at Logger.info (/opt/zigbee2mqtt/lib/util/logger.ts:201:14) at Znp.onPortClose (/opt/zigbee2mqtt/node_modules/zigbee-herdsman/src/adapter/z-stack/znp/znp.ts:113:16) at Object.onceWrapper (node:events:634:26) at Socket.emit (node:events:519:28) at TCP. (node:net:338:12)

image

habitats-tech commented 3 months ago

@habitats-tech if you're going to test the watchdog, could you report back how usable Z2M is? Im already reverting back to 1.38.0, but after an update to 1.39.0 my Z2M and UZG-1 both needed restart every 20 minutes or so.

I have found 1.39.0 more stable. To my observations the Z2M version number has nothing to do with disconnections. However, so far I have failed to find a pattern. I have the exact config in 2 different locations, one does not fail at all, the other disconnects every 100s and automatically reconnects every 10s.

In one of the installations I had frequent disconnections, and after plugging and unplugging the UZG several times trying different things it stopped disconnecting. I have failed to find why.

Still in the process of investigating.

habitats-tech commented 3 months ago

I am in the process of comparing PoE Zigbee coordinators under Zigbee2MQTT as well as trying to identify and fix issues with PoE coordinators.

I assume everyone has experienced issues with UZG-01 and I have created a document which not only compares, but also provides feedback on my troubleshooting and product findings.

If anyone is interested to join the conversation please visit this link.

https://portal.habitats.tech/Zigbee2MQTT/Zigbee+PoE+HW+Comparison

Especially if you can provide feedback use this thread, or my Ko-fi page, or use https://community.home-assistant.io/t/zigbee2mqtt-uzg-01-ha/749094

My aim is to support everyone improve their product as well as eliminate/bypass/fix the issues almost anyone is facing with these products. Improvements are to the benefit of everyone.

habitats-tech commented 3 months ago

So far my findings with UZG-01 are not promising. UZG-01 at some point will disconnect from Zigbee2MQTT (even if it takes days), the result being Zigbee2MQTT will abort and requires manual restart. I am running Zigbee2MQTT manually so I can detect disconnections.

On the installation where UZG-01 disconnections were every 100s, reinstalling Zigbee2MQTT while changing the channel fixed the issue. I am uncertain if Zigbee2MQTT reinstallation or changing the Zigbee communications channel, or both fixed the issue.

We need a reliable way to run Zigbee2MQTT with a restart function. The one suggested does not work and I would like to propose the following simple function:

run Zigbee2MQTT using three parameters:

Dinth commented 3 months ago

Sorry, let me stop you here, are you saying that after Z2M disconnects (whether its UZG-1 or SLZB-06p7) you can just restart Z2M and everything is back to normal, you dont need to simultaneously restart the coordinator?

habitats-tech commented 3 months ago

Sorry, let me stop you here, are you saying that after Z2M disconnects (whether its UZG-1 or SLZB-06p7) you can just restart Z2M and everything is back to normal, you dont need to simultaneously restart the coordinator?

Yes this is correct.

habitats-tech commented 3 months ago

I have an update. UZG-01 is the issue not Zigbee2MQTT : UZG-01 FW update failure

thecode commented 3 months ago

Just to add more data to this thread, I have a SLZB-06 connected via a single switch to the server. At some random time (can vary between once a day or once a week) the socket disconnects. I have made multiple tests to make sure the connection between the device and the server is ok and didn't find any problem. I left a ping running to the SLZB-06 and when I had a socket disconnection the ping was still running without errors. My docker is set to restart on failure and upon restart it connects again immediately.

The main issue which I think should be handled by Zigbee2MQTT is not to crash completely upon socket disconnection, but just to log a warning/error and try to connect again, when Zigbee2MQTT crashes it marks all the entities as unknown in Home Assistant and it takes time to get back to a normal state. If the socket reconnection will happen without fully crashing this will be seamless and not noticeable as the occurrence is very low and reconnecting to the device is very fast.

habitats-tech commented 3 months ago

I confirm the disconnections are a Zigbee2MQTT issue. After 5 days had the SLZB-06P7 disconnect its Zigbee socket (Ethernet remained connected). We definitely need a reliable way to get Z2M to automatically recover socket disconnections.

javifly commented 3 months ago

I have the same problem, every 2 or 3 hours the zigbee disconnects. I have removed the watchdog so that it doesn't restart by itself. And to be able to calmly see what happens but I can't find a solution.

I have a Zigbee2MQTT 1.39.0-1 and a SLZB-06P7.

[2024-07-18 02:51:04] info: zh:zstack:znp: Socket error [2024-07-18 02:51:04] info: zh:zstack:znp: Port closed [2024-07-18 02:51:04] info: zh:zstack:znp: closing [2024-07-18 02:51:04] error: z2m: Adapter disconnected, stopping [2024-07-18 02:51:04] info: z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/bridge/state', payload '{"state":"offline"}' [2024-07-18 02:51:04] info: z2m: Disconnecting from MQTT server [2024-07-18 02:51:04] info: z2m: Stopping zigbee-herdsman... [2024-07-18 02:51:04] info: z2m: Stopped zigbee-herdsman [2024-07-18 02:51:04] info: z2m: Stopped Zigbee2MQTT`

habitats-tech commented 3 months ago

I am testing to see if socket disconnection is an issue related to zStack only, or if it also affects the ember stack. Will update once I have reliable data.

javifly commented 3 months ago

We recommend doing a hard reset after the firmware update, for this, turn on the device with the button pressed, when the LEDs start to flash, release the button. and only configure the time and the ip, nothing else... and test to see if it goes ok.

asagent7 commented 2 months ago

I am facing the same issue with a SLZB-06 and doing the hard reset as mentioned above did not fix it as well. Is there any other suggestion to try for working around this or reverting to an older version the only solution for now?

Edit: Older versions also don't work and fail with similar error

asagent7 commented 1 month ago

For me, this turned out to be an issue with a podman update which changed the default rootless network stack from slirp4netns to pasta. Reverting to slirp4netns for zigbee2mqtt container resolved this for me.

https://github.com/Koenkk/zigbee2mqtt/issues/22590#issuecomment-2179847056

habitats-tech commented 1 month ago

As of Sep 2024, using a UZG-01 as coordinator, Z2M does not disconnect unless I try to access the UZG-01 through the webUI. I have also found that disconnection issues completely disappear if I use dedicated routers (UZG-01/SLZB06x) to connect to the coordinator.

Ember coordinators also suffer disconnection issues.

Using Z2M Linux installation method (not Docker), Z2M can be automatically restarted on disconnection using the example service file provided below (failure-detection-time 10s + restart-time 1s). Ideally a UI Z2M config setting, would be preferable and easier for everyone to handle disconnections, IMHO.

[Unit]
Description=zigbee2mqtt
After=network.target
#.... OTHER SETTINGS

[Service]
#.... OTHER SETTINGS
ExecStart=/usr/bin/node index.js
WorkingDirectory=/opt/zigbee2mqtt
WatchdogSec=10s
Restart=on-failure
RestartSec=1s
#.... OTHER SETTINGS

[Install]
WantedBy=multi-user.target
#.... OTHER SETTINGS
Jeppedy commented 2 weeks ago

There seems to be other issues creeping into this core issue, perhaps clouding clear line of sight to the core problem: I'm using the UZG-01 (likely any network coordinator will have the same issue), and any time there is a network disruption, Z2M stops running because it can't see the coordinator. I assume the watchdog gives up trying to restart when it sees the coordinator device is offline, and boom, I'm dead in the water. I'll try the restart automation mentioned above, but perhaps the watchdog can keep trying for a longer period of time, especially when it's a network coordinator?

javifly commented 2 weeks ago

If you have solved the problem, you have to put the coordinator near the home assitant, if possible on the same switch. Even inside the same switch, depending on the brand, I have seen problems. I tried with scwith and 2 gave me problems and 2 went ok.

my network is ok, it doesn't lose packets or ping or anything, but for some reason it disconnects even on the same switch.

Jeppedy commented 2 weeks ago

Sounds like you do have some sort of network problem. We are all seeing a disconnect/failed reconnect when the network goes down. You have an additional problem that your network traffic may be taking mid-day breaks ;-)

On Tue, Oct 15, 2024, 11:14 Javier Gutierrez Abella < @.***> wrote:

If you have solved the problem, you have to put the coordinator near the home assitant, if possible on the same switch. Even inside the same switch, depending on the brand, I have seen problems. I tried with scwith and 2 gave me problems and 2 went ok.

my network is ok, it doesn't lose packets or ping or anything, but for some reason it disconnects even on the same switch.

β€” Reply to this email directly, view it on GitHub https://github.com/Koenkk/zigbee2mqtt/issues/23236#issuecomment-2414454705, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB32XY2UWXKUDW55ATTTHB3Z3U5NLAVCNFSM6AAAAABKHO2CGKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMJUGQ2TINZQGU . You are receiving this because you commented.Message ID: @.***>

AndreWillems commented 1 week ago

got exactly the same problem as @javifly. Previously, I got Z2M running with the SLZB-06p7 stick running smooth with a TP-Link switch. now I upgraded to a Unify switch (several) and got this same problem; randomly, the socket connection gets lost (without any relevant info in log files) and Z2M restarts.

I've excluded other possible causes, like power the PoE stick, fixed/DHCP IP address, different firmware (core/zigbee), different locations in my network, but it remains unstable.

I think the issue is within the Z2M container (and network settings from there?) or the routing in the network itself (switch settings?), but no clue how to configure that.. help!

Koenkk commented 1 week ago

@Jeppedy are you sure you enabled the watchdog? docs

Jeppedy commented 1 week ago

Yes, Watchdog is enabled. It continues trying to restart during the network outage, then gives up. An Automation that triggers when "the network-based coordinator comes on-line, confirms Z2M is not running, then tries to restart it" is working great.

On Sun, Oct 20, 2024, 11:13 Koen Kanters @.***> wrote:

@Jeppedy https://github.com/Jeppedy are you sure you enabled the watchdog? docs https://www.zigbee2mqtt.io/guide/installation/15_watchdog.html#watchdog

β€” Reply to this email directly, view it on GitHub https://github.com/Koenkk/zigbee2mqtt/issues/23236#issuecomment-2425164048, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB32XY5EIBIRWM33B2CIH63Z4PXD5AVCNFSM6AAAAABKHO2CGKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRVGE3DIMBUHA . You are receiving this because you were mentioned.Message ID: @.***>

javifly commented 1 week ago

got exactly the same problem as @javifly. Previously, I got Z2M running with the SLZB-06p7 stick running smooth with a TP-Link switch. now I upgraded to a Unify switch (several) and got this same problem; randomly, the socket connection gets lost (without any relevant info in log files) and Z2M restarts.

I've excluded other possible causes, like power the PoE stick, fixed/DHCP IP address, different firmware (core/zigbee), different locations in my network, but it remains unstable.

I think the issue is within the Z2M container (and network settings from there?) or the routing in the network itself (switch settings?), but no clue how to configure that.. help!

put the old switch back on and get rid of any doubts.

AndreWillems commented 1 week ago

hi all,

I found it ! memory... dont laugh...

I checked the HAOS console and system log and saw the Z2M container (and sometimes another container) crashed because of 'out of memory'.... this explains why there was nothing in the Z2M or SMlight logs... i did install other Add-ons recently and maybe some containers simply require more memory after an update.

you can easily check this on the command line with 'free -h'

I increased the memory and now it has been running 100% stable for 10 hours... i conclude this is a fix, at least in my case.

Dinth commented 1 week ago

got exactly the same problem as @javifly. Previously, I got Z2M running with the SLZB-06p7 stick running smooth with a TP-Link switch. now I upgraded to a Unify switch (several) and got this same problem; randomly, the socket connection gets lost (without any relevant info in log files) and Z2M restarts.

I've excluded other possible causes, like power the PoE stick, fixed/DHCP IP address, different firmware (core/zigbee), different locations in my network, but it remains unstable.

I think the issue is within the Z2M container (and network settings from there?) or the routing in the network itself (switch settings?), but no clue how to configure that.. help!

thats an interesting observation.

Before ive been using an USB stick plugged into my server in my loft - certainly not perfect and my mesh was not very stable. But when i moved to a network connected Zigbee coordinator the nightmare has started, ive been trying different coordinator units, different models, different chips and firmwares, different vendors and all of them were needed a reboot after some (usually short) time (only the latest firmware made things better for me - the coordinators no longer need regular reboots), but the only common element between all those different hardware options i have tried is that they were all using network. My network is rock stable, routed by pfSense with an enterprise grade switch, no packets dropped looking at the switch diagnostics.

AndreWillems commented 1 week ago

@Dinth I also used to use USB zigbee dongles and must say this worked also pretty well. but having a PoE stick on a more central location in my house is much more convenient. and I got great experience with the SMLight SLZB-06. I would put this on your rock stable network for a solid zigbee network. and as mentioned above, I solved my problem by adding memory to my HAOS virtual machine (1 to 2 Gb). it simply crashed as it ran out of memory....

Jeppedy commented 1 week ago

+1 for the networked PoE Zigbee Controller. I use UZG-01 and it is rock solid for me

On Tue, Oct 22, 2024, 11:02 AndreW @.***> wrote:

@Dinth https://github.com/Dinth I also used to use USB zigbee dongles and must say this worked also pretty well. but having a PoE stick on a more central location in my house is much more convenient. and I got great experience with the SMLight SLZB-06. I would put this on your rock stable network for a solid zigbee network. and as mentioned above, I solved my problem by adding memory to my HAOS virtual machine (1 to 2 Gb). it simply crashed as it ran out of memory....

β€” Reply to this email directly, view it on GitHub https://github.com/Koenkk/zigbee2mqtt/issues/23236#issuecomment-2429680416, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB32XY5KHY7HDYGQDKQGL7DZ4ZZJTAVCNFSM6AAAAABKHO2CGKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMRZGY4DANBRGY . You are receiving this because you were mentioned.Message ID: @.***>

javifly commented 1 week ago

i have 4GB in my HA machine.

image
Dinth commented 6 days ago

I also used to use USB zigbee dongles and must say this worked also pretty well. but having a PoE stick on a more central location in my house is much more convenient.

Thats exactly my case too! :)

I solved my problem by adding memory to my HAOS virtual machine (1 to 2 Gb). it simply crashed as it ran out of memory....

Currently my HAOS VM has 16GB RAM assigned, before it was running bare metal on an 8GB mac mini. Plus, running out of memory would not explain why i always needed to restart both Z2M and the networked coordinator too.

AndreWillems commented 6 days ago

@Dinth , nothing in the HAOS log (on the console) ? this pointed me in the right direction as I didnt see anything in the HA / Z2M or SMLight logfile.....