Closed pauloon closed 1 year ago
I reinstalled Z2M all from zero and today it happened again:
2022-11-09T17:11:10.996Z zigbee-herdsman:adapter:zStack:znp:AREQ <-- AF - incomingMsg - {"groupid":0,"clusterid":61184,"srcaddr":55137,"srcendpoint":1,"dstendpoint":1,"wasbroadcast":0,"linkquality":61,"securityuse":0,"timestamp":11657102,"transseqnumber":0,"len":10,"data":{"type":"Buffer","data":[9,72,2,0,193,1,4,0,1,0]}} 2022-11-09T17:11:10.996Z zigbee-herdsman:controller:log Received 'zcl' data '{"frame":{"Header":{"frameControl":{"frameType":1,"manufacturerSpecific":false,"direction":1,"disableDefaultResponse":false,"reservedBits":0},"transactionSequenceNumber":72,"manufacturerCode":null,"commandIdentifier":2},"Payload":{"seq":49408,"dpValues":[{"dp":1,"datatype":4,"data":{"type":"Buffer","data":[0]}}]},"Command":{"ID":2,"parameters":[{"name":"seq","type":33},{"name":"dpValues","type":1011}],"name":"dataReport"}},"address":55137,"endpoint":1,"linkquality":61,"groupID":0,"wasBroadcast":false,"destinationEndpoint":1}' Zigbee2MQTT:debug 2022-11-09 14:11:10: Received Zigbee message from 'Presença 2', type 'commandDataReport', cluster 'manuSpecificTuya', data '{"dpValues":[{"data":{"data":[0],"type":"Buffer"},"datatype":4,"dp":1}],"seq":49408}' from endpoint 1 with groupID 0 <--- Last few GCs ---> [7:0x7f45b57b73c0] 84373002 ms: Scavenge (reduce) 2042.8 (2083.1) -> 2041.9 (2083.1) MB, 2.5 / 0.0 ms (average mu = 0.105, current mu = 0.019) allocation failure [7:0x7f45b57b73c0] 84373008 ms: Scavenge (reduce) 2043.0 (2083.1) -> 2042.1 (2083.1) MB, 2.7 / 0.0 ms (average mu = 0.105, current mu = 0.019) allocation failure [7:0x7f45b57b73c0] 84373015 ms: Scavenge (reduce) 2043.0 (2083.1) -> 2042.1 (2083.3) MB, 2.6 / 0.0 ms (average mu = 0.105, current mu = 0.019) allocation failure <--- JS stacktrace ---> FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
@Koenkk ... can you, please, help?
Thank you. Paulo.
What node version are you using?
What node version are you using?
Thanks for helping me. Not sure what you mean by "node". Sorry for my ignorance. Are you talking about NodeRed ? I'm using version 13.5.2.
Follows attached a diagnostic file with some info about my system. config_entry-mqtt-5f9bbf81bfa29cb6923246ba774884c4.json.txt
Please, let me know.
Thanks, Paulo.
I mean node.js (try running node --version
on your system)
I mean node.js (try running
node --version
on your system)
Hi @Koenkk , I tried and it does not work:
Maybe because I'm using Hass.io and NodeRed?
Thanks, Paulo.
how are you running z2m, via the addon?
how are you running z2m, via the addon?
Dear @Koenkk , Yes, I've installed it via Add-On.
See if this helps...
Thanks, Paulo.
If I turn off the watchdog it stops after a few hours. Just happened again... We are in a group of more than 2.000 HA users and more people are getting this too. Looks like Hass.io users...
@Koenkk .... Look! With help from some friends, i think I got the node.js version:
Will wait for your instructions.
Thanks, Paulo.
Can you provide the herdsman debug logging, from starting z2m till the error?
Hi @Koenkk ,
Of course. Is that the one I get from Settings > System > Logs > Z2M, copy and save?
Is there a way to access former logs? I'm waiting for it to crash again to get the log.
Thanks, Paulo.
I have not seen this screen before, how big is the log? (does it include everything from start)
I have not seen this screen before, how big is the log? (does it include everything from start)
Dear @Koenkk , This is the screen I mentioned before. I'm not sure if it includes all of it. I just grabbed all of its contents so you can see: log.txt Please let me know if there are any other ways to get the logs. I could get the logs from the former crashes.
I'm still waiting so it crashes again but since I'm not home these days maybe it "cooled down" a little.
Thanks, Paulo.
These are not the herdsman debug logging. (they contain zigbee-herdsman
like in https://github.com/Koenkk/zigbee2mqtt/issues/14853#issuecomment-1312524794).
See https://www.zigbee2mqtt.io/guide/usage/debug.html#home-assistant-os-supervised-addon how to get the logging.
These are not the herdsman debug logging. (they contain
zigbee-herdsman
like in #14853 (comment)).See https://www.zigbee2mqtt.io/guide/usage/debug.html#home-assistant-os-supervised-addon how to get the logging.
Dear @Koenkk ,
I GOT ANOTHER CRASH! I copied the entire log: log2.txt
Please, let me know.
Thanks, Paulo.
can you turn off Espelho 2 4gang
and see if the error still occurs?
can you turn off
Espelho 2 4gang
and see if the error still occurs?
I will remove it and restart Z2M, ok?
Thanks, Paulo.
Ok 👍
I seem to have the same issue here (I opened an issue at https://github.com/Koenkk/zigbee2mqtt/issues/14992 before finding this one). I'm also using node version v16.17.1, so perhaps it has something to do with that indeed? I'm still trying to find the logs.
By the way, for me it happens much more often, sometimes 4 to 5 times on one evening.
Ok 👍
I got another crash today, so I guess it is not "Espelho 2 4gang" fault. Now I reseted it again and I'm testing one more.
Mine crashed as well. The Z2M logs gave me the following error:
Error: SRSP - AF - dataRequestExt after 6000ms
at Timeout._onTimeout (/app/node_modules/zigbee-herdsman/src/utils/waitress.ts:64:35)
at listOnTimeout (node:internal/timers:559:17)
at processTimers (node:internal/timers:502:7)
These are not the herdsman debug logging. (they contain
zigbee-herdsman
like in #14853 (comment)).See https://www.zigbee2mqtt.io/guide/usage/debug.html#home-assistant-os-supervised-addon how to get the logging.
I am not able to retrieve the herdsman logs using this method, because the docker container just seems to crash in my case. So when I run docker ps
, there isn't a z2m one running after the crash.
Dear @Koenkk ,
After I removed the Espelho 2 4gang, it is still happening. Follows the last log extracted: log4.txt
Please, let me know.
Thanks, Paulo.
@pauloon does the docker container also completely stop in your case?
@pauloon does the docker container also completely stop in your case?
Hi @XanderTenBoden I think the docker is crashing too. One time that I ran docker ps and didn't see it. I'm going to let it happen again, and check to make sure.
Thanks, Paulo.
@pauloon What I did is turning off the watchdog and then wait for it to stop working. As soon as that did, all my zigbee devices in lovelace turned to "unavailable", and the zigbee2mqtt add-on wasn't running anymore.
It seems like the issue has something to do with the z2m docker eating too much RAM. I've enabled the memory sensors and this is what I'm seeing in the history:
I do have a lot of devices connected, about 90. How many do you have? It might explain why the crashes are more frequently for me if you have less than that connected? 🤔
@pauloon What I did is turning off the watchdog and then wait for it to stop working. As soon as that did, all my zigbee devices in lovelace turned to "unavailable", and the zigbee2mqtt add-on wasn't running anymore.
It seems like the issue has something to do with the z2m docker eating too much RAM. I've enabled the memory sensors and this is what I'm seeing in the history:
I do have a lot of devices connected, about 90. How many do you have? It might explain why the crashes are more frequently for me if you have less than that connected? 🤔
@XanderTenBoden , I have 51 devices. I'm also keeping watchdog disabled, otherwise it restarts Z2M. I'll try to check the memory history, same as you did.
Thanks, Paulo.
@pauloon I also see Presença 4
and Presença 2
are spamming a lot, can you also try to remove those?
@pauloon I also see
Presença 4
andPresença 2
are spamming a lot, can you also try to remove those?
Hi @Koenkk ,
I will try turn them off, to stop the traffic. These are mmWave presence sensors, which I have four. I know they push a lot of data from one expose called "target_distance" that pushes every second, but ONLY when there is someone being detected. As far as I understood it is not possible to turn off this reporting. I added this expose in the "filters" so it reduces MQTT data (I don't use it). I'm guessing these are not guilty, because the crashes that happened a few days ago I was traveling, and they were reporting much less data.
I'll keep you posted.
Thanks, Paulo.
@Koenkk ,
Just got another crash. Very fast, this time.
Follows the log: log5.txt
I just had restarted HA and Z2M after turning off the sensors.
@XanderTenBoden , Indeed, the contained crashed too. But I did not get the same memory spique that you got.
Please, let me know, Thanks, Paulo.
@pauloon how do you get these logs?
When I use the docker logs CONTAINER_ID > log.txt 2>&1
command I only get like 15 lines max.
@pauloon how do you get these logs?
When I use the
docker logs CONTAINER_ID > log.txt 2>&1
command I only get like 15 lines max.
@XanderTenBoden ,
Settings > Add-Ons > Z2M > Logs
Rgds, Paulo.
Mine stopped responding as we speak, however the container is still running. Logs I got now are these:
[19:00:24] INFO: Preparing to start...
[19:00:25] INFO: Socat not enabled
[19:00:27] INFO: Starting Zigbee2MQTT...
Zigbee2MQTT:warn 2022-11-17 19:00:33: `permit_join` set to `true` in configuration.yaml.
Zigbee2MQTT:warn 2022-11-17 19:00:33: Allowing new devices to join.
Zigbee2MQTT:warn 2022-11-17 19:00:33: Set `permit_join` to `false` once you joined all devices.
Zigbee2MQTT:error 2022-11-17 19:00:59: Publish 'set' 'state' to 'front_door_light' failed: 'Error: Command 0x00158d00032cadb0/1 genOnOff.off({}, {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 39508 - 1 - 6 - 6 - 11 after 10000ms)'
Zigbee2MQTT:error 2022-11-17 19:01:02: Publish 'set' 'brightness' to 'living_room_desk_led_strip' failed: 'Error: Command 0xa4c138e0bfda4062/1 genLevelCtrl.moveToLevelWithOnOff({"level":254,"transtime":0}, {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Data request failed with error: 'No network route' (205))'
Zigbee2MQTT:error 2022-11-17 19:01:24: Publish 'set' 'color' to 'living_room_desk_led_strip' failed: 'Error: Command 0xa4c138e0bfda4062/1 lightingColorCtrl.moveToColor({"transtime":0,"colorx":21168,"colory":21561}, {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Data request failed with error: 'No network route' (205))'
Zigbee2MQTT:error 2022-11-17 20:52:08: Publish 'set' 'state' to 'office_spots' failed: 'Error: Command 0x00158d0006371aab/1 genOnOff.on({}, {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 40507 - 1 - 148 - 6 - 11 after 10000ms)'
Zigbee2MQTT:error 2022-11-17 21:53:40: Publish 'set' 'state' to 'office_spots' failed: 'Error: Command 0x00158d0006371aab/1 genOnOff.on({}, {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 40507 - 1 - 153 - 6 - 11 after 10000ms)'
Zigbee2MQTT:error 2022-11-17 21:54:20: Publish 'set' 'state' to 'office_spots' failed: 'Error: Command 0x00158d0006371aab/1 genOnOff.on({}, {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 40507 - 1 - 164 - 6 - 11 after 10000ms)'
Mine crashed as well. The Z2M logs gave me the following error:
Error: SRSP - AF - dataRequestExt after 6000ms at Timeout._onTimeout (/app/node_modules/zigbee-herdsman/src/utils/waitress.ts:64:35) at listOnTimeout (node:internal/timers:559:17) at processTimers (node:internal/timers:502:7)
These are not the herdsman debug logging. (they contain
zigbee-herdsman
like in #14853 (comment)). See https://www.zigbee2mqtt.io/guide/usage/debug.html#home-assistant-os-supervised-addon how to get the logging.I am not able to retrieve the herdsman logs using this method, because the docker container just seems to crash in my case. So when I run
docker ps
, there isn't a z2m one running after the crash.
I'm looking at this error now that I had yesterday, and when googling on that I get a result on this page that seems to crash on the exact same line (waitress.ts:64:35): https://github.com/Koenkk/zigbee2mqtt/issues/9136#issuecomment-944098300
Not sure if I'm onto something though, but perhaps it's worth mentioning?
I've got a new crash and log here:
@pauloon I seem to have a lot less debugging data. How is your Z2M debugging config set up?
I've got a new crash and log here:
@pauloon I seem to have a lot less debugging data. How is your Z2M debugging config set up?
Dear @XanderTenBoden ,
I guess is because I have these parameters configured (Herdsman Debug Active):
Settings > Add-Ons > Z2M > Configuration tab
And in my Z2M "configuration.yaml":
advanced: log_level: debug
Hope this helps.
Thanks, Paulo.
Ah, That checkbox was hidden in my case, I turned it to "on".
It crashed again. I do have a little bit more logging now, but since the docker container crashed again, I can only retrieve the last lines that are shown in the Z2M add-on's interface in HA:
https://tenboden.com/log-z2m-xander-18-11-2022-2.txt
@Koenkk do you notice anything strange in here?
@Koenkk @XanderTenBoden Just got another crash...
The memory got the behaviour you said before:
Follows the log: log6.txt
The container also crashed. The mmWave Tuya presence sensors were disabled (I don't believe it is a device specific problem).
Please, let me know if I can help anyway.
Thanks, Paulo.
It really do looks like there is a bug that makes something consume memory until it crashes:
Mine just crashed 2 minutes (!) after restarting it.
5 minutes later: Aaannnd again. Basically, my Home Assistant is completely unusable by now 🥲
@Koenkk ... Please, help us. I have other fellows that are suffering the same with the HassOS. Memory use keeps building up:
Probably it is a glitch or bug to cause this. If you need remote access to my HA to investigate, we can arrange that.
Thanks, Paulo.
@pauloon what I've also noticed is that before actually crashing the docker container, my devices stop responding to any interaction. In other words: Z2M is still 'running', but lights can't be toggled and stuff like that. Do you see the same behaviour?
@pauloon what I've also noticed is that before actually crashing the docker container, my devices stop responding to any interaction. In other words: Z2M is still 'running', but lights can't be toggled and stuff like that. Do you see the same behaviour?
I didn't notice that, but it might be possible. It took a while for me to check the Z2M after noticing that it had crashed. Maybe it was "frozen" before crashing. Not sure.
Paulo.
PS: I'm been talking to people and almost everyone has watchdog on, so I guess there's more people being impacted by this problem, but they just don't notice. Here, if I leave watchdog on, I don't notice anything because I guess it gets restarted quickly.
Thanks, Paulo.
PS: I'm been talking to people and almost everyone has watchdog on, so I guess there's more people being impacted by this problem, but they just don't notice. Here, if I leave watchdog on, I don't notice anything because I guess it gets restarted quickly.
Thanks, Paulo.
Yes maybe. I found out about the issue because of the non-responsiveness I mentioned before. And because some automations that were sending TTS messages to my Google Home tend to trigger when restarting Z2M.
Memory is coming to the limit...
Let's see if it crashes again....
Memory seems to keep building up constantly:
Now I'm making another test... I've made a new install of HassOS on another computer, pretty much clean, just MQTT, M2Z and Node-Red. The memory seems to have the same behaviour, but in a much slower pace (maybe because I've added only two devices?):
Could be it an indication of a bug with the HassOS and the Z2M add-on?
@pauloon it seems like i've resolved the issue here.
The issue was that I had some power plugs with consumption-statistics that were pushing these statistics every second. After changing this to once every 10 seconds, my Z2M no longer seems to crash and the memory is at a table level.
What happened?
After last update I've noticed Z2M is stopping service with a fatal error, out of the blue.
I'm using HASS.IO all updated, in a i5 machine with 8 Gb do memory and 128 GB SSD.
What did you expect to happen?
No response
How to reproduce it (minimal and precise)
Just leave it running. After one or two days it stops working (service drops).
Zigbee2MQTT version
1.28.2 commit: unknown
Adapter firmware version
20220219
Adapter
SONOFF USB Dongle
Debug log
I did not have the debug log active when this happened. Posting normal log. ... Zigbee2MQTT:info 2022-11-07 11:45:15: MQTT publish: topic 'zigbee2mqtt/Smart Plug 15', payload '{"child_lock":"UNLOCK","current":0.04,"energy":6.45,"indicator_mode":"off/on","last_seen":"2022-11-07T11:45:13-03:00","linkquality":102,"power":0,"power_outage_memory":"restore","state":"ON","update":{"state":"idle"},"update_available":false}' <--- Last few GCs ---> [7:0x7fa21c7993c0] 61013533 ms: Mark-sweep 2044.2 (2085.3) -> 2042.2 (2085.3) MB, 2087.8 / 0.0 ms (average mu = 0.133, current mu = 0.010) allocation failure scavenge might not succeed [7:0x7fa21c7993c0] 61015639 ms: Mark-sweep 2044.3 (2085.3) -> 2042.2 (2085.3) MB, 2082.8 / 0.0 ms (average mu = 0.074, current mu = 0.011) allocation failure scavenge might not succeed <--- JS stacktrace ---> FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory