Closed merlinpimpim closed 2 months ago
To continue the conversation started here @Nerivec : https://github.com/Koenkk/zigbee2mqtt/discussions/21140#discussioncomment-8340955:
I have provided a herdsman debug log but it is very difficult to get it just before a crash which can happen at any time. In this example the crash occurred 4 or 5 minutes after the logs were extracted (and then a Z2M restarts so a new Z2M container ID...).
I have no other error
in the logs (in the Z2M interface) when this happens. See the following screenshot.
In this same screenshot we see examples of how often the problem occurs:
It is therefore not certain that the problem is due to the suspension of the USB ports.
However, the cat /sys/module/usbcore/parameters/autosuspend
command gives the value 2 (not -1).
And I haven't figured out how to change it permanently on my system (Khadas Vim1S, Fenix/Debian) including reading this:
https://unix.stackexchange.com/questions/91027/how-to-disable-usb-autosuspend-on-kernel-3-7-10-or-above
(I'm not a Linux system expert...)
In any case, the echo -1 >/sys/module/usbcore/parameters/autosuspend
command is taken into account (until the next reboot), but it doesn't change the problem.
I hope I've given you enough information to help me.
You are using silabs multiprotocol addon, right? If so, your firmware version shouldn't be 7.x.x, multiprotocol RCP firmware are currently all 4.x.x.
https://github.com/NabuCasa/silabs-firmware/tree/main/RCPMultiPAN https://github.com/darkxst/silabs-firmware-builder/tree/main/firmware_builds/skyconnect
Also make sur the settings in silabs match whatever firmware you pick (baudrate). You'll probably want to go with a 460800 one (but older nabu casa ones are 115200).
You're right because I did some manipulations in this direction just before your message and I think there's a clue, but I don't understand it any more:
I've just used https://skyconnect.home-assistant.io/firmware-update/ to flash this firmware: https://github.com/darkxst/silabs-firmware-builder/blob/main/firmware_builds/skyconnect/rcp-uart-802154-v4.4.0-skyconnect-460800.gbl (probably the same as https://github.com/NabuCasa/silabs-firmware/blob/main/RCPMultiPAN/beta/NabuCasa_SkyConnect_RCP_v4.4.0_rcp-uart-hw-802154_460800.gbl) ?
Next: I manage to start SLM in 460800 baud but in this case Z2M does not start any more including by changing my current parameters:
port: tcp://core-silabs-multiprotocol:9999
adapter: ezsp
baudrate: 115200
to:
port: tcp://core-silabs-multiprotocol:9999
adapter: ezsp
baudrate: 460800
So I put 115200 back into SLM and Z2M and everything starts fine, but in Z2M/settings/About I read "Coordinator revision":7.3.1.0 build 0
And in this configuration I always get "Adapter disconnected, stopping" error messages (obviously...)
How can I see a version 7.3.1.0 in Z2M after flashing an 4.4.0 RCP version? I'm lost....
Did you make sure to disable anything that could be using the adapter before starting the flashing procedure? (ZHA, you shouldn't have at all in your case), silabs multiprotocol addon and Z2M. I assume you didn't get any particular error from the flasher? (I've never used it since I only have a Sonoff Dongle-E).
PS: If you are not using the multiprotocol (other radios than Zigbee), I'd recommend you use an NCP firmware, with just Z2M. SLM/Zigbeed adds a layer of complexity and no major benefit if you don't have a huge network of devices.
I flashed the adapter from my PC, using the website mentioned above. So it wasn't connected to my Khadas with Home Assistant. And I didn't get any errors during the operation. When I inserted the key a second time, the flasher confirmed that I had the expected version.
Yes, I was thinking of going back to the NCP firmware to test, but we agree that if I do that, I'll have to re-pair my 25 devices one by one (bearing in mind that some of my Zigbee relays are in wall switch boxes, in an outdoor electric gate post, etc...) ? :((
Could there be another explanation for the fact that Z2M "sees" a 7.3.1 version when I'm on RCP 4.4.0 (an issue with a file in the /zigbee2mqtt directory of the Z2M addon in HA, or something else...)?
If the flasher is telling you 4.4.0 after flashing, then you're good. As for Z2M, the version would actually be reported by the silabs addon, that itself gets it from Zigbeed, not the adapter (since with multiprotocol the Zigbee stack is offloaded to the host). _I seemed to remember the SDK version being reported when I tested the addon a while back, but maybe I was wrong and it was the EmberZNet version instead, or maybe they changed that internally (bit of a mess really)._
I looked, apparently Home Assistant reverted the version of the multiprotocol addon SDK back to 4.3.1 (again), which would report 7.3.1 EmberZNet (what you're seeing). And might explain a lot of you troubles. They created "special firmware" too... https://github.com/home-assistant/addons/tree/master/silabs-multiprotocol/rootfs/root See https://github.com/home-assistant/addons/pull/3429
On the "re-pair" question, when I tested the addon a few months back, I had to re-pair everything when I switched from NCP to RCP (which also upgraded the EZSP version), but not when I switched back to NCP at the same version level as the RCP (the equivalent, as described in first link...), Z2M kept all my devices talking without issue... But I heard mixed reports on people who also tried, so... 😢
I installed the firmware you were talking about (4.3.1), I managed to start SLM (2.4.4) at 460800 baud and Z2M at 115200 baud 😊 but unfortunately I still get the famous error message 😞
For info on Z2M:
Can you try to catch the logs like you did before? There was a lot of "noise" in your previous one because of decryption failures (I think?). Hopefully these will now be gone.
PS: The Z2M baudrate won't matter in your current setup, only the SLM one will be used (since Z2M is connecting through socket to SLM).
Here's an updated file: logs9.txt A crash occurred 4 minutes after this extraction.
About the Z2M baudrate, I believe you but if I don't put these parameters in the Z2M add-on configuration, it won't start:
adapting: ezsp
baudrate: 115200
I am facing a similar issue, more details here : https://github.com/Koenkk/zigbee-herdsman/issues/910
Here is some log that lead to shutdown of zigbee2mqtt : https://github.com/Koenkk/zigbee-herdsman/files/14170047/lixee-2.log
@Nerivec feel free to ask more logs.
Ah, finally a possible lead :) And I get the impression that this phenomenon is a very French issue ;) Thank you @romarysonrier
@merlinpimpim indeed , a tiny lead. is Lixee the cause or the consequence? that is question. It seems that it is related to our specific Lixee device using EZSP, which does a lot a polling on many values.
@Nerivec looking at my logs, we see a kind of pattern, having the same sequence :
first , we do have this log line:
zlinky config: standard, xxxx_phase, false, yyyyy
from src/devices/lixee.ts , generate by "getCurrentConfig(device, options)" line 1044.
then, we have
endpoint Read 0x00158d0005d25e8c/1 liXeePrivate(["currentDate","currentIndexTarif","currentPrice","currentTarif"] .....
then , few millesecondes after we have some error log :
Unexpected packet sequence X | X+1
followed by
Unhandled frame childJoinHandler
hope its helps.
I can confirm this behaviour with my HA Yellow, Multiprotocol firmware and Z2M 1.35.2, with 1.35.1 it works. Not Lixee device in my network. But I would have to get more precise log messages to find out where exactly the error happens, the only error I have for now is "Adapter disconnected"
@jhbruhn could your HW and exact FW revision? and share some log with zigbee-herdsman debugging enable ? by running
DEBUG=zigbee-herdsman* npm start
.
Could you also list the router/endpoint device running in your zigbee network ? I higly suspect that intensive polling of non Reportable attributes is part of the root cause.
@merlinpimpim Could you also list the router/endpoint devices running in your zigbee network ? I higly suspect that intensive polling of non Reportable attributes is part of the root cause.
@Nerivec even shorter log : lixee-3.log
Also, among the seveal test case, most "Unexpected packet sequence" occurence seen are related to ACK
Disabling polling on Linky_TIC / Lixee by setting an empty string on param tic_command_whitelist is a working dirty workarround that stop unexpected shutdown of Z2M.
tic_command_whitelist: ~
But by doing so we loose the non Reportable attribute values, as described here : https://github.com/fairecasoimeme/Zlinky_TIC
EDIT : empty tic_command_whitelist disable also Reportable attributes, so it is not a evidence for any rootcause.
Looking at this sequence :
At first, a data is sent :
and was expecting a ACK : -?- waiting (2)
But Z2M never received this ACK (2)
and later while receiving uart <-- ACK (1):
we get a Unexpected packet sequence 1 | 2 +2ms
@Koenkk @Nerivec is this expected ?
Full log here : lixee-4.log
The Coordinator Type is
EZSP v12
to be specific a HA Yellow running the most recent (Gecko SDK 4.3.1) Multiprotocol firmware.
Coordinator revision is
7.3.1.0 build 0
Logs are attached here: logs.txt
(Unfortunately it seems like HA truncated the logs at the start)
The devices in my network are:
info 2024-02-06 19:45:47: Starting Zigbee2MQTT version 1.35.2 (commit #unknown)
info 2024-02-06 19:45:47: Starting zigbee-herdsman (0.33.5)
info 2024-02-06 19:45:53: zigbee-herdsman started (resumed)
info 2024-02-06 19:45:53: Coordinator firmware version: '{"meta":{"maintrel":"1 ","majorrel":"7","minorrel":"3","product":12,"revision":"7.3.1.0 build 0"},"type":"EZSP v12"}'
info 2024-02-06 19:45:53: Currently 69 devices are joined:
info 2024-02-06 19:45:53: Wohnzimmer Regallampe (0xcc86ecfffe19d137): LED1733G7 - IKEA TRADFRI LED bulb E14 600 lumen, dimmable, white spectrum, opal white (Router)
info 2024-02-06 19:45:53: Büroja Deckenlampe (0xf0d1b8000014287b): 4058075168572 - LEDVANCE SMART+ lamp E27 tuneable white (Router)
info 2024-02-06 19:45:53: Büroja Schreibtischlampe (0x001788010b22487a): 9290024684 - Philips Hue white ambiance E27 1100lm with Bluetooth (Router)
info 2024-02-06 19:45:53: Büroja Heizung (0xcc86ecfffec1f63f): SEA801-Zigbee/SEA802-Zigbee - Saswell Thermostatic radiator valve (EndDevice)
info 2024-02-06 19:45:53: Büroja Fenster links (0x00124b002521b10d): SNZB-04 - SONOFF Contact sensor (EndDevice)
info 2024-02-06 19:45:53: Büroja Fenster rechts (0x00124b00254e0b3a): SNZB-04 - SONOFF Contact sensor (EndDevice)
info 2024-02-06 19:45:53: Büroja Kaffeeknopf (0x04cd15fffe4a6863): E1812 - IKEA TRADFRI shortcut button (EndDevice)
info 2024-02-06 19:45:53: Wohnzimmer Stehlampe (0x001788010cae07a2): 8719514328242 - Philips Hue white ambiance E27 800lm with Bluetooth (Router)
info 2024-02-06 19:45:53: Wohnzimmer Deckenlampe (0xf0d1b8000013cb3f): 4058075168572 - LEDVANCE SMART+ lamp E27 tuneable white (Router)
info 2024-02-06 19:45:53: Wohnzimmer Kugellampe (0x680ae2fffe557254): LED1937T5_E27 - IKEA TRADFRI LED bulb E27 470 lumen, dimmable, opal white (Router)
info 2024-02-06 19:45:53: Wohnzimmer STYRBAR (0x50325ffffe2ac628): E2001/E2002 - IKEA STYRBAR remote control (EndDevice)
info 2024-02-06 19:45:53: Sterni (0x50325ffffef5205f): LED2003G10 - IKEA TRADFRI LED bulb E26/27 1100/1055/1160 lumen, dimmable, white spectrum, opal white (Router)
info 2024-02-06 19:45:53: Wohnzimmer Heizung (0xcc86ecfffec87edd): SEA801-Zigbee/SEA802-Zigbee - Saswell Thermostatic radiator valve (EndDevice)
info 2024-02-06 19:45:53: Wohnzimmer Fenster (0x00124b00234e98ef): SNZB-04 - SONOFF Contact sensor (EndDevice)
info 2024-02-06 19:45:53: Wohnzimmer SYMFONISK (0x943469fffe65989c): E1744 - IKEA SYMFONISK sound controller (EndDevice)
info 2024-02-06 19:45:53: Lichterkette (0xcc86ecfffe89ee37): E1603/E1702/E1708 - IKEA TRADFRI control outlet (Router)
info 2024-02-06 19:45:53: Flur Deckenlampe (0xf0d1b8000013ad20): 4058075168572 - LEDVANCE SMART+ lamp E27 tuneable white (Router)
info 2024-02-06 19:45:53: Bewegungsmelder (0x00124b002502f639): SNZB-03 - SONOFF Motion sensor (EndDevice)
info 2024-02-06 19:45:53: Bürojo Heizung (0xcc86ecfffec39bb9): SEA801-Zigbee/SEA802-Zigbee - Saswell Thermostatic radiator valve (EndDevice)
info 2024-02-06 19:45:53: Bürojo Lichtknopf (0x003c84fffe259547): E1812 - IKEA TRADFRI shortcut button (EndDevice)
info 2024-02-06 19:45:53: Bürojo Fenster (0x00124b002518a8cc): SNZB-04 - SONOFF Contact sensor (EndDevice)
info 2024-02-06 19:45:53: Schlafzimmer Lichtleiste (0xb4e3f9fffea55396): ICPSHC24-10EU-IL-1 - IKEA TRADFRI driver for wireless control (10 watt) (Router)
info 2024-02-06 19:45:53: Schlafzimmer Lichtschalter (0x04cd15fffe0f9437): E1743 - IKEA TRADFRI ON/OFF switch (EndDevice)
info 2024-02-06 19:45:53: Schlafzimmer Vorhangschalter (0x94deb8fffe0f3a98): E1743 - IKEA TRADFRI ON/OFF switch (EndDevice)
info 2024-02-06 19:45:53: Schlafzimmer Heizung links (0xcc86ecfffef4d241): SEA801-Zigbee/SEA802-Zigbee - Saswell Thermostatic radiator valve (EndDevice)
info 2024-02-06 19:45:53: Schlafzimmer Fenster links (0x00124b002521b14d): SNZB-04 - SONOFF Contact sensor (EndDevice)
info 2024-02-06 19:45:53: Schlafzimmer Fenster rechts (0x00124b00251d357c): SNZB-04 - SONOFF Contact sensor (EndDevice)
info 2024-02-06 19:45:53: Schlafzimmer Heizung rechts (0x847127fffe0a5cb4): SEA801-Zigbee/SEA802-Zigbee - Saswell Thermostatic radiator valve (EndDevice)
info 2024-02-06 19:45:53: Badezimmer Heizung (0xcc86ecfffec20a8a): SEA801-Zigbee/SEA802-Zigbee - Saswell Thermostatic radiator valve (EndDevice)
info 2024-02-06 19:45:53: Küche Deckenlampe (0xf0d1b8000013cb94): 4058075168572 - LEDVANCE SMART+ lamp E27 tuneable white (Router)
info 2024-02-06 19:45:53: Küche Esstisch Hängelampe 1 (0x001788010ba70b7e): 9290024684 - Philips Hue white ambiance E27 1100lm with Bluetooth (Router)
info 2024-02-06 19:45:53: Küche Esstisch Hängelampe 2 (0x001788010b1f7e51): 9290024684 - Philips Hue white ambiance E27 1100lm with Bluetooth (Router)
info 2024-02-06 19:45:53: Küche Lichtleiste (0x001788010b4783f2): 8718699703424 - Philips Hue white and color ambiance LightStrip plus (Router)
info 2024-02-06 19:45:53: Küche Wandlampe (0xf0d1b8000013cc06): 4058075168572 - LEDVANCE SMART+ lamp E27 tuneable white (Router)
info 2024-02-06 19:45:53: Küche Heizung (0xcc86ecfffef4d631): SEA801-Zigbee/SEA802-Zigbee - Saswell Thermostatic radiator valve (EndDevice)
info 2024-02-06 19:45:53: Küche STYRBAR (0x50325ffffe2ac673): E2001/E2002 - IKEA STYRBAR remote control (EndDevice)
info 2024-02-06 19:45:53: Küche SYMFONISK (0x943469fffe5fbe4a): E1744 - IKEA SYMFONISK sound controller (EndDevice)
info 2024-02-06 19:45:53: Küche Lichtknopf (0x2c1165fffecfb8c1): E1812 - IKEA TRADFRI shortcut button (EndDevice)
info 2024-02-06 19:45:53: Küche Esstischschalter (0x0017880109a9ee05): 929002398602 - Philips Hue dimmer switch (EndDevice)
info 2024-02-06 19:45:53: Schlafzimmer Deckenlampe (0xf0d1b8000014c5f1): 4058075168572 - LEDVANCE SMART+ lamp E27 tuneable white (Router)
info 2024-02-06 19:45:53: Sudokuknopf (0x70ac08fffedcf453): E1812 - IKEA TRADFRI shortcut button (EndDevice)
info 2024-02-06 19:45:53: Bürojo Arbeitslampe Schein 1 (0x001788010cac1700): 8719514328242 - Philips Hue white ambiance E27 800lm with Bluetooth (Router)
info 2024-02-06 19:45:53: Bürojo Arbeitslampe Schein 2 (0x001788010cad2b03): 8719514328242 - Philips Hue white ambiance E27 800lm with Bluetooth (Router)
info 2024-02-06 19:45:53: Bürojo Arbeitslampe Schein 3 (0x001788010cae0767): 8719514328242 - Philips Hue white ambiance E27 800lm with Bluetooth (Router)
info 2024-02-06 19:45:53: Bürojo Arbeitslampe Direkt (0x2c1165fffe85ef45): LED2003G10 - IKEA TRADFRI LED bulb E26/27 1100/1055/1160 lumen, dimmable, white spectrum, opal white (Router)
info 2024-02-06 19:45:53: Büroja Schreibtisch (0xa4c1387eb0418c52): TS011F_plug_1 - TuYa Smart plug (with power monitoring) (Router)
info 2024-02-06 19:45:53: Wohnzimmer Subwoofer (0xa4c1382197f8815d): TS011F_plug_1 - TuYa Smart plug (with power monitoring) (Router)
info 2024-02-06 19:45:53: Schlafzimmer Lichterschalter (0x142d41fffe3b900e): E1743 - IKEA TRADFRI ON/OFF switch (EndDevice)
info 2024-02-06 19:45:53: Flur Trockner (0xa4c138a4bc29642d): TS011F_plug_1 - TuYa Smart plug (with power monitoring) (Router)
info 2024-02-06 19:45:53: Bürojo Schreibtisch (0xa4c1385540f20e2b): TS011F_plug_1 - TuYa Smart plug (with power monitoring) (Router)
info 2024-02-06 19:45:53: Flur Waschmaschine (0xa4c138d87a63c833): TS011F_plug_1 - TuYa Smart plug (with power monitoring) (Router)
info 2024-02-06 19:45:53: Wohnzimmer Kuglibus (0x001788010bfbac59): 8719514301481 - Philips Hue Filament Globe Ambiance E27 (Router)
info 2024-02-06 19:45:53: Bürojo Bastelleuchte Schein 1 (0x001788010cac19ed): 8719514328242 - Philips Hue white ambiance E27 800lm with Bluetooth (Router)
info 2024-02-06 19:45:53: Bürojo Bastelleuchte Schein 2 (0x001788010cad16c7): 8719514328242 - Philips Hue white ambiance E27 800lm with Bluetooth (Router)
info 2024-02-06 19:45:53: Bürojo Bastelleuchte Schein 3 (0x001788010b9aa2c6): 8719514328242 - Philips Hue white ambiance E27 800lm with Bluetooth (Router)
info 2024-02-06 19:45:53: Bürojo Bastelleuchte Direkt (0x50325ffffee76dff): LED2004G8 - IKEA TRADFRI LED bulb E26/E27 800/806 lumen, dimmable, white spectrum, clear (Router)
info 2024-02-06 19:45:53: Bürojo Schalter (0x6c5cb1fffe2c29ac): E2201 - IKEA RODRET wireless dimmer/power switch (EndDevice)
info 2024-02-06 19:45:53: Büroja Thermometer (0xa4c13884a645bced): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info 2024-02-06 19:45:53: Küche Thermometer (0xa4c1381a2f1411d1): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info 2024-02-06 19:45:53: Wohnzimmer Thermometer (0xa4c1387ac22fdbea): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info 2024-02-06 19:45:53: Bürojo Thermometer (0xa4c1380cb5161457): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info 2024-02-06 19:45:53: Filament Thermometer (0xa4c13829ce9c4039): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info 2024-02-06 19:45:53: Flur Thermometer (0xa4c138f649eca602): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info 2024-02-06 19:45:53: Schlafzimmer Thermometer (0xa4c1389bd33d2b4d): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info 2024-02-06 19:45:53: Draußen Thermometer (0xa4c138742dff2382): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info 2024-02-06 19:45:53: Badezimmer Thermometer (0xa4c138bd9299f1b6): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info 2024-02-06 19:45:53: Wohnzimmer Lichtleiste (0xf0d1b800001bd287): 4058075208339 - LEDVANCE Flex 3P multicolor (Router)
info 2024-02-06 19:45:53: Serverschrank Thermometer (0xa4c13866cfe09f7c): LYWSD03MMC - Custom devices (DiY) Xiaomi temperature & humidity sensor with custom firmware (EndDevice)
info 2024-02-06 19:45:53: Flur Bewegungsmelder (0xd87a3bfffe015855): E2134 - IKEA VALLHORN wireless motion sensor (EndDevice)
@merlinpimpim Could you also list the router/endpoint devices running in your zigbee network ? I higly suspect that intensive polling of non Reportable attributes is part of the root cause.
Hello,
Here are (all) my devices:
Aqara MCCGQ11LM
Aqara DJT11LM
LiXee ZLinky_TIC
Neo NAS-AB06B2
Neo NAS-AB02B2
NodOn SIN-4-1-20
Nous E2
Nous A1Z
Nous A1Z
Nous A1Z
OSRAM AB3257001NJ
SONOFF ZBMINIL2
SONOFF SNZB-04
SONOFF SNZB-02D
SONOFF SNZB-02D
SONOFF SNZB-02D
SONOFF SNZB-02D
TuYa TS0601_soil
TuYa TS0207_repeater
TuYa TS0207_repeater
Xiaomi WXKG01LM
Xiaomi WSDCGQ01LM
Xiaomi GZCGQ01LM
If it helps...
After checking, the common link between my devices and those of @jhbruhn (excluding Lixee) are SonOff SNZB-xxx devices, including SNZB-04s in particular.
I'm still looking into this, but my list of tasks is growing faster than my two eyes can follow 😅 I doubt Sonoff devices are the cause, I've got about 20 of them, I'd have caught the bug too!
I've fixed a couple of problems raised from your original log @merlinpimpim, but not particularly linked to LiXee...
@jhbruhn By any chance do you have any older logs from 1.35.1 or before? Older versions were doing internal restarts (without reaching Z2M level "Adapter disconnected"), that you might not have noticed, but they could still have been happening.
Absolutely, I am running 1.35.1 right now and also noticed that devices sometimes do not react in that version. I will try to collect some logs about that behaviour the next time I notice it.
But the problems with 1.35.2 occur right after starting Z2M, making it unusable, that does not happen with 1.35.1. Maybe the problem is different than that?
Then likely different yes, and (without seen the error can't say for sure) there's a good chance that is fixed already. I'm just looking for a couple of other buggers before I create a PR.
Disabling polling on Linky_TIC / Lixee by setting an empty string on param tic_command_whitelist is a working dirty workarround that stop unexpected shutdown of Z2M.
tic_command_whitelist: ~
But by doing so we loose the non Reportable attribute values, as described here : https://github.com/fairecasoimeme/Zlinky_TIC
EDIT : empty tic_command_whitelist disable also Reportable attributes, so it is not a evidence for any rootcause.
@Nerivec : I think @romarysonrier is on to something because since I did what he said, Z2M hasn't rebooted for 2 and a half hours, which is a record since the 1.35.2 update (Z2M currently reboots between 1 and 8 times an hour for me...).
Parameter tic_command_whitelist
changed from "all" to empty:
The issue with this workaround is that no more attributes are exposed. So it's as if the device doesn't exist and/or serves no purpose:
However, this shows that the issue is indeed with Lixee for our cases.
I'm going to leave it until tomorrow morning to confirm that it didn't restart overnight. To be continued.
Glancing at the github for that device, tomorrow (if leaving it like that worked) you might want to try enabling only the values that interest you in that (long) list. I believe that's it there in the first column, and see if that improves stability.
On my side , I tried yesterday :
if (targ.length) {
into
if (false && targ.length) {
this way is Reportable Attributes from tic_command_whitelist
are still being updated , but not polling is done at all.And then i did those testing cases :
Unexpected packet sequence
messagesThus , I will retry test case B ( all attributes belonging to the same cluster = 0x702) for a longer time frame, to gather more log and clues. But is the lixee device still send updated to all Reportable attribute from all cluster (haElectricalMeasurement+haMeterIdentification+lXeePrivate+seMetering).
@Nerivec : So it so mean that the abnormal behaviour is also trigger when using only Reportable attribute update
My return today:
I updated Z2M to version 1.35.3 last night, leaving tic_command_whitelist
empty: no crashes in 15 hours! 😀
So the culprit is clearly identified.
Now I've added some Reportable attributes (I'm in three-phase), I haven't restarted Z2M, but I've done a Reconfigure Device as shown in the screenshot:
Error 2024-02-07 16:23:04Failed to configure 'GARAGE - Compteur électrique - LiXee', attempt 1 (Error: ConfigureReporting 0x00158d0005d2a627/1 haElectricalMeasurement([{"attribute":"totalApparentPower","minimumReportInterval":60,"maximumReportInterval":900,"reportableChange":1}], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 104 - 2820 - 7 after 10000ms) at Object.start (/app/node_modules/zigbee-herdsman/src/utils/waitress.ts:63:23) at EZSPAdapter.sendZclFrameToEndpointInternal (/app/node_modules/zigbee-herdsman/src/adapter/ezsp/adapter/ezspAdapter.ts:486:47) at processTicksAndRejections (node:internal/process/task_queues:95:5) at Queue.execute (/app/node_modules/zigbee-herdsman/src/utils/queue.ts:35:20) at Request.send (/app/node_modules/zigbee-herdsman/src/controller/helpers/request.ts:79:20) at Endpoint.configureReporting (/app/node_modules/zigbee-herdsman/src/controller/model/endpoint.ts:669:28) at async Promise.allSettled (index 15) at Object.configure (/app/node_modules/zigbee-herdsman-converters/src/devices/lixee.ts:1020:14) at Configure.configure (/app/lib/extension/configure.ts:120:13) at Configure.onMQTTMessage (/app/lib/extension/configure.ts:55:21))
- No reboot for 1 hour (I'll wait a few more hours to be sure)
no reboot for 3 hours :)
@Nerivec i found out that zigbee-herdsman ezrp uart driver doesn't support NAK message properly.
Here is a proposal for handleDATA()
diff --git a/src/adapter/ezsp/driver/uart.ts b/src/adapter/ezsp/driver/uart.ts
index 1755a0e..f07b26d 100644
--- a/src/adapter/ezsp/driver/uart.ts
+++ b/src/adapter/ezsp/driver/uart.ts
@@ -44,6 +44,7 @@ export class SerialDriver extends EventEmitter {
private sendSeq = 0; // next frame number to send
private recvSeq = 0; // next frame number to receive
private ackSeq = 0; // next number after the last accepted frame
+ private rx_reject = false;
private waitress: Waitress<EZSPPacket, EZSPPacketMatcher>;
private queue: Queue;
@@ -210,6 +211,24 @@ export class SerialDriver extends EventEmitter {
debug(`<-- DATA (${frmNum},${frame.control & 0x07},${reTx}): ${frame}`);
+ if(this.recvSeq != frmNum){
+ if(reTx || this.rx_reject){
+ debug(`Unexpected DATA packet sequence ${frmNum} | ${this.sendSeq} : rejecting packet`);
+ }
+ else {
+ debug(`Unexpected DATA packet sequence ${frmNum} | ${this.sendSeq} : Sending NAK & & rejecting packet`);
+ this.writer.sendNAK(this.recvSeq);
+ this.rx_reject=true;
+ }
+ return;
+ }
+ else{
+ if (!reTx && this.rx_reject){
+ debug(`DATA packet sequence ${frmNum} : clear reject`);
+ this.rx_reject=false;
+ }
+ }
+
this.recvSeq = (frmNum + 1) & 7; // next
debug(`--> ACK (${this.recvSeq})`);
@@ -272,6 +291,7 @@ export class SerialDriver extends EventEmitter {
let code;
this.sendSeq = 0;
this.recvSeq = 0;
+ this.rx_reject = false;^M
debug(`<-- RSTACK ${frame}`);
ps : inspired from https://www.silabs.com/documents/public/user-guides/ug101-uart-gateway-protocol-reference.pdf
But zigbee-herdsman doesn't support NAK itsef : handleACK doesn't do retransmission. don't know yet if really necessary.
@merlinpimpim Regarding the root cause : My best assumption is that on my slow CPU (Rasberry pi zero) + all the MQTT updates required by the huge number of attributes sent or polled by the lixee device is so long to handle/execute by NodeJS that the EZSP device trigger retransmission, that current implementation doesn't handle properly.
@merlinpimpim what is your CPU running nodejs ? any source of latencies between Nodejs and the ESZP device ?
I'm using HA + Z2M + Skyconnect on a Khadas Vim1S (https://www.khadas.com/vim1s).
@romarysonrier I'm looking into a more permanent solution to fix these sequence & related issues (that's not the only one). We'll see what I can come up with, needs some automated tests and all... 😉
@Nerivec i made this PR proposal https://github.com/Koenkk/zigbee-herdsman/pull/911/files , which is far from perfect , but still improving the situation with NAK messages on rx side.
On my case (Z2M running an a single core raspbery pi zero) , i think the root cause is now clear , at least for me: 1./ When lixee join the Zigbee network, it will generate on Z2M :
2./ all those updates trigger a huge amount of MQTT updates (quiet big dataset because lots of attributes count) + log writes ( if log_level=info), which generate a peak of CPU usage. 3./ then zigbee-herdsman EZSP UART driver is reading too late message from EZSP firmware (not sending ACK in the required timeframe) 4./ then , at some point, EZSP firmware is lost with seq number and refuses any new TX request with a fatal error. 5./ then zigbee-herdsman EZSP report a faltal error and Z2M shutdown.
So @merlinpimpim if you doing this :
if (targ.length) {
into
if (false && targ.length) {
log_output:
- file
log_level: warn
You should see some improvement , as you lower the amount of CPU required to handle lixee device .
For me , It is running with those settings (+ disabling frontend) since 8 hours without a single Z2M shutdown. The main drawback of this workarround, is lost of non Reportable attributes.
@romarysonrier : I've made the changes in the Lixee.js file and it seems to work after resetting the Reportable attributes to "All". No crashes for 2 hours 😁but I get this type of message in the logs from time to time:
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 liXeePrivate(["message2"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 164 - 65382 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 haElectricalMeasurement(["averageRmsVoltageMeasurePeriodPhB"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 176 - 2820 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 haElectricalMeasurement(["rmsCurrentPhB"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 177 - 2820 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 haElectricalMeasurement(["rmsVoltagePhB"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 179 - 2820 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 haElectricalMeasurement(["rmsVoltagePhC"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 180 - 2820 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 seMetering(["meterSerialNumber"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 181 - 1794 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 seMetering(["siteId"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 182 - 1794 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 haMeterIdentification(["availablePower"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 183 - 2817 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 haMeterIdentification(["powerThreshold"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 184 - 2817 - 1 after 10000ms)
Failed to read zigbee attributes: Error: Read 0x00158d0005d2a627/1 haMeterIdentification(["softwareRevision"], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 10236 - 1 - 185 - 2817 - 1 after 10000ms)
@merlinpimpim "Failed to read zigbee attributes" are only generated by polling / read from Z2M in lixee.ts. Could you double check if you really have disabled polling ?
BTW, my setup is still up and running after 21 hours
Hi guys, I am having the same error, I have a Lixee in historic mode (polling heavy) and the sonoff dongle Plus with the 7.3.1 firmware.
Since the 1.35.3-1 update of z2m, it would crash only a few seconds after start.
I want to know if switching the Linky from Historic to Standard mode would change anything? (Normally it should just be a call to the power provider) I have just started to build a smart home (prior to moving house in a few days) and I will definitely ask for it there.
Edit: in the meantime I simply unplugged the Lixee and z2m held for 24h max. But then failed again with the same issue. It is not only the Lixee then? I have only a few zigbee devices otherwise so far (Phillips hue bulbs and switch, a heiman Smart plug)
@remidebette : switching the Linky from Historic to Standard mode=> more attributes => more mqtt update => more cpu load => more issue What is the cpu running Z2M ? serial over usb or network ?
Not sure to understand the question, I have a Sonoff USB dongle Plus EFR32MG21 plugged on the raspberry pi 3+ that hosts my home assistant So it should be serial over USB?
I did not configure the dongle firmware to do multipan RCP (more precisely I had tested it at some point but I then rolled back the firmware to NCP 7.3.1)
yes your dongle does serial over USB communication. Your setup based on a low end CPU (raspberry pi 3+) , , that may explain why you also encounter the issue.
Could you list you zigbee devices ? Could you try that fix :
if (targ.length) {
into
if (false && targ.length) {
log_output:
- file
log_level: warn
This setup is running since 4 days on my side ( but i lost few non reportable attributes, the ones taged as read only in https://github.com/fairecasoimeme/Zlinky_TIC/tree/master?tab=readme-ov-file#subscription-optarif-values)
@merlinpimpim "Failed to read zigbee attributes" are only generated by polling / read from Z2M in lixee.ts. Could you double check if you really have disabled polling ?
BTW, my setup is still up and running after 21 hours
It's not that simple because to do it I have to connect to the Z2M container and make the change to the lixee.js file. This is a non-permanent modification because every time Z2M restarts the modification is lost (Z2M can restart without any link to Lixee, for example Home Assistant, HACS, Z2M version upgrades, etc.).
However, I've already had Z2M restart because of Lixee even though I think I've made the change in the current Z2M container... Maybe I'll wait for the PRs to be deployed in production now.
Anyway, by limiting the reportable attributes to the 6 I'm interested in and setting the top parameter to 2, I haven't had a single reboot in 24 hours.
it seems that @mildis is the most active contributor lixee.ts, would be great to have some help...
@Koenkk : I would be great to implement a new feature to lower CPU usage + merge https://github.com/Koenkk/zigbee-herdsman/pull/911 :
the main benefits would be to decrease the amount of attributes updated by polling ( Reportable attributes doesn't need polling), and then to reduce the number of MQTT updates sent to HA.
Hi @romarysonrier, Proud you think of me for this but unfortunately I don't get that much about Z2M internals. I've just poke around Lixee as a end-user that could help making it more complete. Maybe @fairecasoimeme has a better understanding of the whole picture…
Just in case, without any changes to the converter or other files, if I delete the Lixee from Z2M 1.35.3 and start again from scratch, I get this error message (4th attempt, but same message every time):
Error 2024-02-18 22:01:05Failed to configure '0x00158d0005d2a627', attempt 4 (Error: ConfigureReporting 0x00158d0005d2a627/1 haElectricalMeasurement([{"attribute":"totalApparentPower","minimumReportInterval":60,"maximumReportInterval":900,"reportableChange":1}], {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 30499 - 1 - 135 - 2820 - 7 after 10000ms) at Object.start (/app/node_modules/zigbee-herdsman/src/utils/waitress.ts:63:23) at EZSPAdapter.sendZclFrameToEndpointInternal (/app/node_modules/zigbee-herdsman/src/adapter/ezsp/adapter/ezspAdapter.ts:486:47) at processTicksAndRejections (node:internal/process/task_queues:95:5) at Queue.execute (/app/node_modules/zigbee-herdsman/src/utils/queue.ts:35:20) at Request.send (/app/node_modules/zigbee-herdsman/src/controller/helpers/request.ts:79:20) at Endpoint.configureReporting (/app/node_modules/zigbee-herdsman/src/controller/model/endpoint.ts:669:28) at async Promise.allSettled (index 15) at Object.configure (/app/node_modules/zigbee-herdsman-converters/src/devices/lixee.ts:1020:14) at Configure.configure (/app/lib/extension/configure.ts:120:13) at Configure.onMQTTMessage (/app/lib/extension/configure.ts:55:21))
Hey @merlinpimpim, sorry I've been a little buried in code lately 😉 Did you try to force pairing to a specific device (instead of just "all" when you "permit join")? Try the coordinator first, if it's in range, otherwise the nearby routers one by one. You may have a router that's picking it up and making troubles... Since it's a timeout, seems a message is getting lost somewhere (or not being sent to the right place...).
@romarysonrier In case you have a test environment where you could put the LiXee up against ember, that'd be great (though don't go breaking your house 😄). This device clearly is a big spammer, which is one of the items missing from my "TODO tests" list!
@Nerivec: Sorry, I was busy myself 😅 By the way, before Z2M 1.36.0, it wasn't possible to try out your proposal because of this bug: https://github.com/Koenkk/zigbee-herdsman/pull/938 However, with the latest version of Z2M and after deleting LiXee from Z2M and starting a new pairing from scratch by forcing pairing with the nearest router (the coordinator is too far from the electricity meter), the good news is that it seems to have worked without getting the fatal message ("Adapter disconnected, stopping") for 48 hours! 😁 (@romarysonrier: I've left all the default settings, without touching anything, in the "Settings (specific)" tab).
This issue is stale because it has been open 180 days with no activity. Remove stale label or comment or this will be closed in 30 days
What happened?
I regularly get this type of message with my adapter: "Adapter disconnected, stopping". In this case, Z2M restarts.
(Continuation of the conversation started here: https://github.com/Koenkk/zigbee2mqtt/discussions/21140#discussioncomment-8340955)
What did you expect to happen?
Correct the problem by finding out why it happens
How to reproduce it (minimal and precise)
Nothing, just wait and read the logs
Zigbee2MQTT version
1.35.2
Adapter firmware version
7.3.1.0 build 0
Adapter
Skyconnect (Multiprotocol)
Setup
HA supervised, Khadas Vim1S (Docker)
Debug log
logs.txt