Open stephenmahood opened 6 months ago
OK this gets funny. I stopped the main z2m container (on the same machine) and now the error is gone from the z2m-test container. I started it again and the error is back.
Looks like the main instance is locking that adapter too. No idea how, in its configuration the device mapping is done using the /dev/serial/by-id/... link. By the way, I tried that in the z2m-test container config as well (instead of /dev/ttyACM1:/dev/ttyACM0), no difference.
Now I am really confused.
I managed to get it working by:
Either something is broken or I know even less about containers than I thought. My understanding is/was that each container is completely separated / isolated (so /dev/ttyACM0 in container1 is a different thing from /dev/ttyACM0 in container2, which is a different thing from... you get the idea). Why container2 (z2m-test) would think that its own /dev/ttyACM0 is the same as the /dev/ttyACM0 of container1 (z2m-prod) is beyond me.
Nevertheless, I now have the secondary instance up and running. I will enable debug logging, re-power two sensors, pair them to the new one, and wait for them to go offline.
_I guess you could temporarily disable log rotation while you need more history with log_rotation: false
if the machine can handle it._
I'd assume the firmware is affected, since there's no way 2023 had a firmware released a few weeks ago (it's probably 7.1.x or 7.2.x if Ikea didn't use old stuff), but can't be sure that's the specific issue here... I don't think Parasolls have updates yet. https://ww8.ikea.com/ikeahomesmart/releasenotes/releasenotes.html
Can you test a Parasoll with Z2M and disable availability for it, see if you can still get it to report state a few days later?
_I guess you could temporarily disable log rotation while you need more history with
log_rotation: false
if the machine can handle it._Done, thanks for the tip. This instance shouldn't generate a lot of traffic as it only has two devices.
I'd assume the firmware is affected, since there's no way 2023 had a firmware released a few weeks ago (it's probably 7.1.x or 7.2.x if Ikea didn't use old stuff), but can't be sure that's the specific issue here... I don't think Parasolls have updates yet. https://ww8.ikea.com/ikeahomesmart/releasenotes/releasenotes.html
Indeed, no mention of Parasolls in recent times.
Can you test a Parasoll with Z2M and disable availability for it, see if you can still get it to report state a few days later?
I think I tried this a few weeks ago, made no difference. At the moment, I am running Z2M with all default options (apart from the trick with /dev/ttyACM1 described above). Maybe let me see if we can reproduce the issue from the other instance, and then tweak further?
Sounds good. Can you also compare the settings for the device between ZHA and Z2M (binding/reporting/etc.)? And how the device is connected in both cases (directly to coordinator or through a router)?
I missed it in your docker config, but can you use the latest dev for the Z2M image in your test setup? It has a whole lot of improvements for ember
this month, better to test on it.
Quick update: ~22 hours later, all 4 sensors (2 in ZHA, 2 in Z2M-test instance) are still working. Important: I noticed availability reporting is 'off' in Z2M-test instance (assuming it's a default). Doesn't seem to hurt the functionality, but will wait for another day or two to see if they still go offline or we narrow this down to the availability reporting.
Sounds good. Can you also compare the settings for the device between ZHA and Z2M (binding/reporting/etc.)? And how the device is connected in both cases (directly to coordinator or through a router)?
Both are on fully default settings (short of the ttyACM1 "fix" and debug logging for the z2m-test instance). All devices are connected directly to the respective coordinators (they are the only devices in the respective networks, no routers).
I missed it in your docker config, but can you use the latest dev for the Z2M image in your test setup? It has a whole lot of improvements for
ember
this month, better to test on it.
Can do, after I give the current setup a shot.
@9shearer if it keeps working on ZHA and fails again on Z2M, would you mind making a sniff of both from the working situation till failure? (https://www.zigbee2mqtt.io/advanced/zigbee/04_sniff_zigbee_traffic.html)
Update 12 hours later (so roughly 34 hours since I stood up the Z2M-test instance with all default settings, and availability checker OFF): the two sensors in Z2M-test instance are now misbehaving (they still blink on open/close, but state doesn't get reflected in Z2M).
By contrast, the two sensors in ZHA are still working fine (this is ~3.5 days since their installation). State changes are reflected in ZHA, and battery levels haven't decreased significantly.
This tells me two things:
I'll now look into how to run the dev image for z2m-test per @Nerivec 's advice. Not sure if sniffer works with my adapters (ZBDongle-E) and it certainly looks like a little bit more work. :)
Slightly more detailed look into logs for 'problem_sensor_A':
I have the full log file (50 MB), but not sure if it brings any additional value. Relevant section with the initial breakdown of problem_sensor_A extracted below.
Upon posting the above, I realized the problem_sensor_A started re-announcing itself two nights ago (very early on 23rd), not last night (24th), yet it seemed to work fine yesterday evening (23rd late). This is interesting...
# grep contact log.log | grep problem_sensor_A
[2024-07-22 21:46:52] info: z2m:mqtt: MQTT publish: topic 'z2m-test/problem_sensor_A', payload '{"ac_status":false,"battery":90,"battery_defect":false,"battery_low":false,"contact":false,"restore_reports":false,"supervision_reports":false,"tamper":false,"test":false,"trouble":false,"update":{"state":"idle"}}'
[2024-07-22 21:53:55] info: z2m:mqtt: MQTT publish: topic 'z2m-test/problem_sensor_A', payload '{"ac_status":false,"battery":90,"battery_defect":false,"battery_low":false,"contact":false,"linkquality":212,"restore_reports":false,"supervision_reports":false,"tamper":false,"test":false,"trouble":false,"update":{"state":"idle"}}'
[2024-07-23 19:25:36] info: z2m:mqtt: MQTT publish: topic 'z2m-test/problem_sensor_A', payload '{"ac_status":false,"battery":90,"battery_defect":false,"battery_low":false,"contact":true,"linkquality":200,"restore_reports":false,"supervision_reports":false,"tamper":false,"test":false,"trouble":false,"update":{"state":"idle"}}'
21:45-22.00 on 22nd is when I stood up the instance and plugged in the sensors. The sensor was open (contact:false) since then until yesterday 19.25 when I tested it. The same (with more changes) for problem_sensor_B, which went into the announce loop on 23rd at 01:28:
# grep contact log.log | grep problem_sensor_B
[2024-07-22 21:48:30] info: z2m:mqtt: MQTT publish: topic 'z2m-test/problem_sensor_B', payload '{"ac_status":false,"battery":96,"battery_defect":false,"battery_low":false,"contact":false,"linkquality":152,"restore_reports":false,"supervision_reports":false,"tamper":false,"test":false,"trouble":false}'
[2024-07-22 21:48:37] info: z2m:mqtt: MQTT publish: topic 'z2m-test/problem_sensor_B', payload '{"ac_status":false,"battery":96,"battery_defect":false,"battery_low":false,"contact":true,"linkquality":156,"restore_reports":false,"supervision_reports":false,"tamper":false,"test":false,"trouble":false}'
[2024-07-22 21:58:07] info: z2m:mqtt: MQTT publish: topic 'z2m-test/problem_sensor_B', payload '{"ac_status":false,"battery":96,"battery_defect":false,"battery_low":false,"contact":true,"linkquality":164,"restore_reports":false,"supervision_reports":false,"tamper":false,"test":false,"trouble":false,"update":{"state":"idle"}}'
[2024-07-23 15:42:07] info: z2m:mqtt: MQTT publish: topic 'z2m-test/problem_sensor_B', payload '{"ac_status":false,"battery":95,"battery_defect":false,"battery_low":false,"contact":true,"linkquality":192,"restore_reports":false,"supervision_reports":false,"tamper":false,"test":false,"trouble":false,"update":{"state":"idle"}}'
[2024-07-23 19:24:26] info: z2m:mqtt: MQTT publish: topic 'z2m-test/problem_sensor_B', payload '{"ac_status":false,"battery":95,"battery_defect":false,"battery_low":false,"contact":false,"linkquality":200,"restore_reports":false,"supervision_reports":false,"tamper":false,"test":false,"trouble":false,"update":{"state":"idle"}}'
[2024-07-23 19:24:35] info: z2m:mqtt: MQTT publish: topic 'z2m-test/problem_sensor_B', payload '{"ac_status":false,"battery":95,"battery_defect":false,"battery_low":false,"contact":true,"linkquality":192,"restore_reports":false,"supervision_reports":false,"tamper":false,"test":false,"trouble":false,"update":{"state":"idle"}}'
[2024-07-23 21:59:10] info: z2m:mqtt: MQTT publish: topic 'z2m-test/problem_sensor_B', payload '{"ac_status":false,"battery":95,"battery_defect":false,"battery_low":false,"contact":true,"linkquality":192,"restore_reports":false,"supervision_reports":false,"tamper":false,"test":false,"trouble":false,"update":{"state":"idle"}}'
Nothing between 2024-07-23 01:11:50
and 2024-07-23 01:12:09
?
Can you confirm the device is doing this re-announce loop in ZHA or not? Z3-compliant end devices are supposed to do a rejoin on reboot but looks like ZHA is still using the old options on this. Still, it shouldn't be a problem either way, since the rejoin goes fine. Although the device definitely shouldn't be looping the message like this, it's too fast to be from subsequent reboots (though since they loop the OTA stuff too, I figure they mishandled a few things in that firmware). I guess, remains to answer why that device seems to be rebooting. Will have to sniff the traffic here, to get more info. Something could end up erasing the configuration done by Z2M on the device, and mess up the reporting (yet the device still blinks normally).
For the sniffer, take a look at the updated docs. ember-zli should work just fine with your dongle-e (works with mine š). Setup is mostly just Wireshark, then execute a command with ember-zli to start the sniff (works on Windows too).
Nothing between
2024-07-23 01:11:50
and2024-07-23 01:12:09
?Nothing whatsoever. I went back and checked manually in the log file on the machine - no entries at all in that interval.
Can you confirm the device is doing this re-announce loop in ZHA or not? Z3-compliant end devices are supposed to do a rejoin on reboot but looks like ZHA is still using the old options on this.
Not sure I get it - this is the log from Z2M rather than from ZHA. In ZHA, identical devices seem to be working fine for 3+ days now (unfortunately that's the biggest data sample I have), so I would reckon the issue isn't happening. Unfortunately, I don't see anything in HomeAssistant logs related to ZHA (most likely it is because I am not doing debug level logging).
Still, it shouldn't be a problem either way, since the rejoin goes fine. Although the device definitely shouldn't be looping the message like this, it's too fast to be from subsequent reboots (though since they loop the OTA stuff too, I figure they mishandled a few things in that firmware). I guess, remains to answer why that device seems to be rebooting. Will have to sniff the traffic here, to get more info. Something could end up erasing the configuration done by Z2M on the device, and mess up the reporting (yet the device still blinks normally).
It does work for some time, until this rejoin loop kills the battery. This is, I think, the issue - the device starts rejoining for some reason, but then even if the rejoin is apparently successful, it tries again and again every 17 seconds.
For the sniffer, take a look at the updated docs. ember-zli should work just fine with your dongle-e (works with mine š). Setup is mostly just Wireshark, then execute a command with ember-zli to start the sniff (works on Windows too).
I'll try to find some time for this as well. Still didn't get to running z2m dev.
I hope this is relevant (sorry if not) I have 4 of the Parasolls on a z2m network, and a mixed bag of Ikea devices (no Parasolls) & others on another z2m network On the Parasoll network I get repeated
Accepting joining not in blocklist device '0x048727 ....
Its not aggressive (i.e. not enough to flatten the batteries in a very short space of time ) but it is very regular (seems to be every 10 mins) and is not just after a network restart. I rather thought this would happen once after a system restart (or a change in route on the network), but not continuously ?
On the network with no Parasolls, you get a flurry of the above after a network restart (as you would expect) but you don't see them again.
Just seems slightly odd behaviour of the Parasolls and possibly symptomatic of the problem here ?
@bonzo-dog If you can get a sniffer running too (see links above), let me know what you find on that Parasoll network. Seems related indeed. I'd say these devices are rebooting and mishandling the rejoin (Z2M enforces it, ZHA doesn't appear to be from what I can tell), but we'd need to know why it is rebooting in the first place if possible.
Can either of you try something? Pull out the battery on a Parasoll, wait a few, then put it back in, and see if there is a proper (single) rejoin done by the device.
Will have a looksee at the sniffing (never sniffed a zigbee network)
Forgot to put in previous post: Zigbee2MQTT version [1.39.0] Coordinator type EZSP v13 Coordinator revision 7.4.1.0 build 0
As per request (hope this helps)
Rebooted z2m for a clean start Let it settle down for a few mins (i.e. let sensors rejoin naturally) Pulled battery on a Parasoll Left it 4 mins (in dead state) Re-inserted battery (and left contact sensor alone - i.e. no manual open/close events)
End result from battery insertion time (grep result for that sensors address) was:
[2024-07-24 16:54:04] debug: zh:ezsp: Device join request received: 10010 048727fffe49eaac
[2024-07-24 16:54:04] debug: zh:controller: Device '0x048727fffe49eaac' joined
[2024-07-24 16:54:04] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-24 16:54:04] debug: zh:controller: Device '0x048727fffe49eaac' accepted by handler
[2024-07-24 16:54:04] debug: zh:controller: Not interviewing '0x048727fffe49eaac', completed 'true', in progress 'false'
[2024-07-24 16:54:05] debug: zh:ezsp: ZDO Device announce: 10010, 048727fffe49eaac
[2024-07-24 16:54:05] debug: zh:ezsp: Device join request received: 10010 048727fffe49eaac
[2024-07-24 16:54:05] debug: zh:controller: Device '0x048727fffe49eaac' joined
[2024-07-24 16:54:05] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-24 16:54:05] debug: zh:controller: Device '0x048727fffe49eaac' accepted by handler
[2024-07-24 16:54:05] debug: zh:controller: Not interviewing '0x048727fffe49eaac', completed 'true', in progress 'false'
[2024-07-24 16:54:05] debug: zh:controller:endpoint: ZCL command 0x048727fffe49eaac/2 ssIasZone.defaultRsp({"cmdId":0,"statusCode":0}, {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":3,"writeUndiv":false})
[2024-07-24 16:54:05] debug: zh:controller:endpoint: Request Queue (0x048727fffe49eaac/2): send defaultRsp request immediately (sendPolicy=undefined)
[2024-07-24 16:54:05] debug: zh:ezsp: sendZclFrameToEndpointInternal 0x048727fffe49eaac:10010/2 (0,0,1), timeout=10000
[2024-07-24 16:54:22] debug: zhc:ota:common: Checking if an update is available for '0x048727fffe49eaac' (PARASOLL Door/Window Sensor)
[2024-07-24 16:54:22] debug: zhc:ota:common: Is new image available for '0x048727fffe49eaac' (PARASOLL Door/Window Sensor), current '{"fieldControl":1,"manufacturerCode":4476,"imageType":12919,"fileVersion":16777241}'
[2024-07-24 16:54:23] debug: zhc:ota:common: Update available for '0x048727fffe49eaac' (PARASOLL Door/Window Sensor): NO
[2024-07-24 16:54:23] debug: zh:controller:endpoint: CommandResponse 0x048727fffe49eaac/1 genOta.queryNextImageResponse({"status":152}, {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":1,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false})
[2024-07-24 16:54:23] debug: zh:controller:endpoint: Request Queue (0x048727fffe49eaac/1): send request
[2024-07-24 16:54:23] debug: zh:ezsp: sendZclFrameToEndpointInternal 0x048727fffe49eaac:10010/1 (0,0,1), timeout=10000
[2024-07-24 17:04:21] debug: zh:ezsp: Device join request received: 10010 048727fffe49eaac
[2024-07-24 17:04:21] debug: zh:controller: Device '0x048727fffe49eaac' joined
[2024-07-24 17:04:21] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-24 17:04:21] debug: zh:controller: Device '0x048727fffe49eaac' accepted by handler
[2024-07-24 17:04:21] debug: zh:controller: Not interviewing '0x048727fffe49eaac', completed 'true', in progress 'false'
[2024-07-24 17:04:22] debug: zh:ezsp: ZDO Device announce: 10010, 048727fffe49eaac
[2024-07-24 17:04:22] debug: zh:ezsp: Device join request received: 10010 048727fffe49eaac
[2024-07-24 17:04:22] debug: zh:controller: Device '0x048727fffe49eaac' joined
[2024-07-24 17:04:22] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-24 17:04:22] debug: zh:controller: Device '0x048727fffe49eaac' accepted by handler
[2024-07-24 17:04:22] debug: zh:controller: Not interviewing '0x048727fffe49eaac', completed 'true', in progress 'false'
Note after 10 mins its rejoining again
Interesting. Mine - if I catch them in the reboot loop, but before they exhaust the battery - come back perfectly fine on a battery removal/reinsert.
@9shearer When you say fine, are they not looping at all, a short loop like bonzo-dog?
Can you two give the versions of firmware reported by your devices?
Parasoll Firmware is 1.0.19
(Don't think Ikea has ever released an update for these as they are quite new)
Just to expand my situation with these:
[2024-07-24 17:04:21] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-24 17:04:22] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-24 17:14:33] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-24 17:14:33] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-24 17:14:33] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac' *** end of battery removal test ***
[2024-07-25 05:33:49] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac' *** Ikea Bulb/Router turned off ***
[2024-07-25 05:33:49] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 05:44:00] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 05:44:00] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 05:54:11] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 05:54:11] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 06:04:22] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 06:04:22] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 06:14:33] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 06:14:33] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 06:24:44] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 06:24:44] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 06:34:55] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 06:34:55] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 06:45:06] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 06:45:06] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
For the sake of completeness, I will redo the battery removal test later, with no Ikea Bulb/Router on the network, to see if it behaves differently
update: did just that - removed Ikea Bulb/Router from network, removed sensor battery, left 4 mins and replaced sensor battery (at 07:13).
Outcome was :
[2024-07-25 07:13:03] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 07:13:04] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 07:24:24] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 07:24:24] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 07:34:22] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 07:34:22] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 07:44:20] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 07:44:20] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 07:54:18] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 07:54:18] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 08:04:16] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 08:04:16] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 08:12:38] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 08:12:38] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 08:24:26] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 08:24:27] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 08:34:24] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 08:34:25] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 08:44:22] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 08:44:22] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 08:54:20] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 08:54:20] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 09:04:18] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
[2024-07-25 09:04:18] info: z2m: Accepting joining not in blocklist device '0x048727fffe49eaac'
So seemingly :
would expect this to happens once as the route has changed
A route change should not trigger a rejoin, it just finds a new route.
A route change should not trigger a rejoin, it just finds a new route.
Bad assumption of mine that was.
Well, I tried to "help" re a wireshark cap, but failed unfortunately. I have 2 adapters - a Sonoff Zigbee Dongle Plus (my main network) and a newer Sonoff ZB Dongle E (the Parasoll network) So, I borrowed the Dongle ZB Dongle E from my Parasoll network to use as a sniffer, put a Parasoll on the Dongle P network, and .... the Parasoll on the Dongle Plus network doesn't repeatedly rejoin. The "battery remove" test results in 1 rejoin on powerup, and that's it .... its nice and stable. Typical. (and if I had read the above posts thoroughly before trying this, I would have realised this is already known - oh well)
I could try to do it the other way round of course, adapter wise, but my nerves are not up to that. (might be possible with the CC2652P chip, but seems a high risk of a one way trip)
As such, I am sorry, but no wireshark cap. from me :(
Just trying to get my head around what I am observing
Given the following:
Sonoff P dongle <==> Parasoll [stable and works as anticipated] Sonoff E Dongle <==> Parasoll [constant rejoins every 10 mins] Sonoff E Dongle <==> Ikea Bulb (Router) <==> Parasoll [stable and works as anticipated]
is it too simplistic just to point the finger at the E Dongle firmware ? (am currently going through E Dongle firmware versions to see if it makes any difference)
Quick note with a few interesting findings:
the two devices did the rejoin loop for quite a long time (started around 1 am on 23rd, stopped around 5 pm on 25th)
in all this interval, the state changes weren't being reflected in z2m (opening/closing the window had no effect in the state json)
there were no rejoin attempts since 5 pm yesterday, BUT this morning, when I opened one of the respective windows (on problem_sensor_B), the device tried to rejoin
even after that rejoin attempt, the state wasn't being updated in z2m (window open, but contact was still 'true')
after this, I restarted z2m test instance to have a fresh log file
removing / reinserting the battery in problem_sensor_B fixed the functionality: the device rejoined the network as expected
the state changes are now properly reflected
problem_sensor_A is still stuck in an undefined state (no rejoin loop messages anymore, but no state updates either, although the sensor does blink when opening/closing); there was no rejoin attempt upon opening/closing that window (unlike for sensor_B)
Very interesting - if the value is right - it appears that the 2.5 days of rejoin loops on problem_sensor_B had a negligible impact on problem_sensor_B's battery (it went from 95% to 92%). Could it be that the disabling (or rather not enabling) of availability reporting prevents battery drain during the rejoin loops?
In the meanwhile, the other two sensors which are plugged into ZHA are still working fine.
General info applicable to all setups:
Just trying to get my head around what I am observing + Given the following:
Sonoff P dongle <==> Parasoll [stable and works as anticipated] Sonoff E Dongle <==> Parasoll [constant rejoins every 10 mins] Sonoff E Dongle <==> Ikea Bulb (Router) <==> Parasoll [stable and works as anticipated]
is it too simplistic just to point the finger at the E Dongle firmware ? (am currently going through E Dongle firmware versions to see if it makes any difference)
I wouldn't necessarily say so, as the same dongle model, with the same firmware, seems to work fine in ZHA (but not in Z2M). Of course, we have no way of telling if there are any subtle build/quality differences between seemingly-identical devices (assuming not). I would rather add to your hypothesis the Z2M element:
@bonzo-dog Would you be able to test the "ZHA + Sonoff E + Parasoll + indirect connect" setup (assuming you are using Home Assistant in the first place)? Also, how long did you wait with the Parasoll on the Dongle-P network? In my experience, these take anywhere between 12 and 36 hours to start misbehaving.
@9shearer
Well ...I have never used HA ... my systems are z2m, mosquitto and my own client software (using the mosquitto client libs) - I have never used anything else. However, I will have a look at HA and see if it would be possible to set up a quick test system here.
re the Parasoll on the Dongle-P network ...
Interestingly, we differ in symptoms (and is why I probably should have just started my own Issue rather than littering this one, even though my drivel might be pertinent ....)
My Parasolls (on the Dongle-E network) just exhibit the consistent rejoins problem, and I am fixated on this as I don't like what it will do to the battery life (the contact sensor system will be deployed a long way away from me and I don't want unnecessary battery problems)
Strangely, however, I don't see your other problems at all - no unresponsive sensors, no dropping off the network and not rejoining, no loop of death seemingly depleting the battery. And that is quite odd.
So in answer to your question, I have not left a Parasoll on the Dongle-P network longer than a couple of hours, as I can see within that timeframe if "my" problem still exhibits itself. However, that is easily fixed - I will transplant a Parasoll onto my Dongle-P network and leave it on there for a few days and see how behaves. Anything interesting/abnormal, I will let you know ....
(btw: just cycled through all the Dongle-E firmware versions from the itead sonoff firmware github - it made no difference unfortunately, but it was a vain hope)
(btw: just cycled through all the Dongle-E firmware versions from the itead sonoff firmware github - it made no difference unfortunately, but it was a vain hope)
You might want to try also the darkxst firmwares (if you haven't already): https://github.com/darkxst/silabs-firmware-builder and see if the sensors still face "your" issue. I do believe they are somewhat related - may be different symptoms of the same underlying problem - so I guess keeping these observations together is well worth it (and maybe helping @Koenkk / @Nerivec / anyone else with the necessary know-how to fix it).
For further experimenting, I have just flashed the ITEAD 7.4.3.0 firmware (instead of the corresponding darkxst one) to my z2m-test instance. Firmware link: https://github.com/itead/Sonoff_Zigbee_Dongle_Firmware/blob/master/Dongle-E/NCP_7.4.3/ncp-uart-sw_EZNet7.4.3_V1.0.0.gbl
First observations:
Interesting messages for problem_sensor_A after reconnection:
@9shearer do you have some feedback from using latest dev? (if the device is doing loop rejoins when it's supposed to do just one -after a reboot/battery pull-, and if it still triggers random loop rejoins after that)
problem_sensor_B went into the reconnect loop less than 2h after being reconnected. This is something new, it was usually something in the region of 12-24h.
z2m-test instance restarted now with the zigbee2mqtt:latest-dev image, dongle still with ITEAD 7.4.3.0 firmware. Let's see how that goes.
Is this relevant ? (the "Unhandled frame" bit)
[2024-07-26 15:40:17] debug: zh:ezsp:ezsp: <== Frame: 029401230002011acee8724afeff27870404
[2024-07-26 15:40:17] debug: zh:ezsp:ezsp: <== 0x23: {"_cls_":"childJoinHandler","_id_":35,"_isRequest_":false,"index":2,"joining":1,"childId":52762,"childEui64":{"type":"Buffer","data":[4,135,39,255,254,74,114,232]},"childType":4}
[2024-07-26 15:40:17] debug: zh:ezsp:driv: Unhandled frame childJoinHandler
Happens when one of my 10 min rejoins happens
(btw: I pulled the latest z2m dev, and I know it's not a release, and this is probably not the place to mention it, and I am not complaining !!, but every sensor state change now results in 2 identical "z2m:mqtt: MQTT publish: topic" messages being sent to, in my case, mosquitto i.e. my system, which hangs off the other side of mosquitto, is now presented with concurrent duplicates of open/closed events.)
@bonzo-dog That message is specific to ezsp
, it doesn't implement the callback, but should be fine, it's just a notification.
@bonzo-dog
(btw: I pulled the latest z2m dev, and I know it's not a release, and this is probably not the place to mention it, and I am not complaining !!, but every sensor state change now results in 2 identical "z2m:mqtt: MQTT publish: topic" messages being sent to, in my case, mosquitto i.e. my system, which hangs off the other side of mosquitto, is now presented with concurrent duplicates of open/closed events.)
could you open a new issue for this with the debug log attached?
See this on how to enable debug logging.
could you open a new issue for this with the debug log attached?
I will do that later. (It struck me later that as well as pulling the z2m dev branch, I had also changed the Dongle-E firmware from 7.4.1 -> 7.4.3, so I need to check if the dongle firmware change caused this ...) I didn't know whether to raise Issues based on observations on dev branch code (a lack of etiquette knowledge on my part), but now I know. Thanks :)
I didn't know whether to raise Issues based on observations on dev branch code (a lack of etiquette knowledge on my part), but now I know. Thanks :)
Definitely! In this way we can prevent bugs from slipping into the the release (which is every 1st of the month)
Unfortunately, after ~24h (didn't look at the logs proper yet), both sensors connected to the z2m-dev instance (running the latest-dev as of yesterday) have stopped reporting state updates into z2m. The devices still blink when door/window is being opened/closed, but the state changes aren't reflected in Z2M (the test instance). I can pull the logs if that helps.
Question: is there any preferred/recommended firmware? My test instance currently runs ITEAD's 7.4.3.0. Is it worth reflashing the Dongle-E to darkxst's version (or any other, if recommended)?
The other two sensors, plugged into ZHA a few days ago, are still running fine.
Use darkxst firmware, it's the most tested ones.
I pulled the latest z2m dev, and I know it's not a release, and this is probably not the place to mention it, and I am not complaining !!, but every sensor state change now results in 2 identical "z2m:mqtt: MQTT publish: topic" messages
Well, that was total garbage - I am sorry about that. Starting from a fresh state, I rolled the Sonoff E back to 7.4.1, scrubbed my z2m install, reinstalled 1.39.0 release to check everything was back to normal before I debug logged the dev branch (to raise an Issue), and I am still getting 2 MQTT Publish per sensor event. It did not do this before. So ..... the only thing changed is that during this testing I used the Sonoff E as a Sniffer (turned out to be a pointless exercise but it did work rather well) Hence, I have to assume that the Sonoff-E is now duplicating messages to z2m due to me using it as a Sniffer a day or so ago. All I can try is to work out how to do a NVram wipe of the Sonoff, and hence, hopefully, get it back to normal .... I must admit that I really wish I had bought another Sonoff-P rather than the E :(
Quick update: with both sensors in the "hanging" state, I reflashed the dongle with darkxst's 7.4.3.0 firmware, and restarted z2m-test. problem_sensor_A: completely dead, no reaction whatsoever to opening/closing problem_sensor_B: seemingly dead, but opening/closing the door caused two rejoin events, about 35 seconds apart. There's no loop, it seems, but that may have ended already. The sensor wasn't working in the sense of updating states in z2m or anything - it just triggered these rejoin events.
Interesting that one device does it, and the other one won't. I will now replace batteries in both, which will reinitialize them, and we see how long they last on the network.
@9shearer
That could be one difference between you and me (and the different observations) Are you now using 1.2V rechargeables or 1.5V alkalines? (sry - I tried to search for this but there too many instances of "battery" above here)
Some IKEA stuff is very picky about battery voltage (especially the motion sensors) and they seem to design around 1.2V cells Hence, I only use 1.2V rechargeables. Just wondered if you were using Alkalines?
(btw: As per previous comment, my sensor on the Dongle-P network is quite happy after almost 48 hrs - behaving itself nicely)
@bonzo-dog I have two different rechargeables in the two sensors I am using for testing:
For clarity, wherever I used "battery" they mean "rechargeable battery" (after running through a pack of 20 alkalines in 3 days, I decided I should invest in rechargeables before ending up broke :) ).
Oh - I thought we might have found a difference to explain our different observations, but fail on that one :( Back to head scratching ....
Iām watching here from the beginning since I do have the same issue. Just want to make sure to not waste too much energy with the adapter. Iām using the CC2652RB and experiencing the same issues as @9shearer, at least on a surface level without herdsman debug logging activated.
@bonzo-dog The sniffing would have no impact on z2m/mqtt. Did you create a new issue for this with a debug log attached?
@Nerivec
No, I didn't as:
It seemed very wasteful of your time & effort to raise an Issue that cannot be of z2m's making ....
Anyway, in the meantime, I have just "fixed it" i.e. back to single Events being sent again I gave up trying to find a method to effectively reset the Dongle-E (i.e .erase its non-volatile settings to start from fresh), so I blew Router Firmware into it (assumption was this would effectively reset any config stored due to change of mode/memory map). Then let it run, and then re-blew the Coordinator firmware. All back to as it was now....
And I am going to buy a Dongle-P for these sensors - I really don't like this Dongle-E anymore :)
@9shearer
That could be one difference between you and me (and the different observations) Are you now using 1.2V rechargeables or 1.5V alkalines? (sry - I tried to search for this but there too many instances of "battery" above here)
Some IKEA stuff is very picky about battery voltage (especially the motion sensors) and they seem to design around 1.2V cells Hence, I only use 1.2V rechargeables. Just wondered if you were using Alkalines?
(btw: As per previous comment, my sensor on the Dongle-P network is quite happy after almost 48 hrs - behaving itself nicely)
I used 1.5V Alkaline, 1,2V 1000 mAh researchable and IKEA LADDA 1,2V 750 mAh all on a Sonoff Dongle P
All drop off the network, but the rechargeables drop off earlier (~24h on rechargeables vs several days on Alkaline). For one sensor I had rejoin loop very frequently (several times a minute, haven't checked the actual timing but every 10 seconds can be).
Not quiet sure if this is linked but since I added the Parasoll sensors to the network I have to reboot the add on once-twice a week since all devices turn to unavailable. The reboot typically brings back the whole network (expect the parasoll sensors).
All drop off the network, but the rechargeables drop off earlier (~24h on rechargeables vs several days on Alkaline). For one sensor I had rejoin loop very frequently (several times a minute, haven't checked the actual timing but every 10 seconds can be).
I also used alkaline batteries in the past and they'd last for a couple of days until stuff starting dropping off. Not sure if it was the battery, the sensor, or this issue here. Out of curiosity, is your rejoin loop every 17 seconds? This is what happens with mine.
As I am the odd one out here, I wonder if we have different batches of the Parasolls (i.e. a component change in a different batch run means problems...) On the label after "Made in China" I have "2352 50050", which would normally mean 2023/Week 52 Yours any newer or older ?
From what I could gather, on zstack firmware, these devices can misbehave if paired directly to coordinator (suggestion is to pair to through a router). Can anyone confirm a behavior change after this (make sure it actually is on a router after connecting it)?
@bonzo-dog for future reference, ember-zli can do all kinds of reset procedures (leave network, reset zigbee tokens and full NVM3 clear on dongle-e).
Out of curiosity, is your rejoin loop every 17 seconds? This is what happens with mine.
Sorry but I can't tell, this happened once on my network and when I just put a battery in it behave normal. But I could bet it was more frequently cause I couldn't dismiss the rejoin info as fast as it popped up again.
As I am the odd one out here, I wonder if we have different batches of the Parasolls (i.e. a component change in a different batch run means problems...) On the label after "Made in China" I have "2352 50050", which would normally mean 2023/Week 52 Yours any newer or older ?
All of mine (three) are "2402 50050"
From what I could gather, on zstack firmware, these devices can misbehave if paired directly to coordinator (suggestion is to pair to through a router). Can anyone confirm a behavior change after this (make sure it actually is on a router after connecting it)?
I can give it a try later on and double check, one is most probably connected directly to the coordinator, but the two others are too far away and surrounded by Ikea Bulbs, Plugs and Repeaters so I doubt they bind directly to the coordinator.
All of mine (three) are "2402 50050"
Interesting. If @9shearer sensors are too of a newer build age than mine, and as Ikea very helpfully put the build date on the outside of their boxes, I'll see if they have any of those (or indeed more recent) on the shelf and, if so, get one.
I honestly think mine would last many weeks, battery wise, but that is nowhere near as long as they should (which is a problem for me because of where my sensors will ultimately end up). And I would like to understand why my system is behaving quite differently to yours - it's bugging me.
What happened?
I've 14 IKEA Parasoll sensors connected to my zigbee network.
The sensors are going offline in Zigbee2MQTT after they should have checked in for the availability check.
My availability settings are set to advanced, 10 min timeout for active devices and 120 mins for passive.
The sensors all have new IKEA LADDA batteries which are the 1.2V type based on other known issues.
The same issue with going offline doesn't appear to happen with ZHA so the issue doesn't appear to be device related
What did you expect to happen?
No response
How to reproduce it (minimal and precise)
No response
Zigbee2MQTT version
1.37.0
Adapter firmware version
20221226
Adapter
SONFF Zigbee Dongle-P
Setup
Add-on within Home Assistant within Proxmox VM on Intel NUC
Debug log
log1.log
Example:
[2024-05-10 10:32:54] debug: z2m: Passive device 'Back Bedroom Right Window' was last seen '2.00' hours ago. [2024-05-10 10:32:54] debug: z2m: MQTT publish: topic 'zigbee2mqtt/Back Bedroom Right Window/availability', payload '{"state":"offline"}'