Open stephenmahood opened 6 months ago
2402 on mine - but as I have a lot of them, and this is the only box left, some might be older.
2402 on mine - but as I have a lot of them, and this is the only box left, some might be older.
You can read it on the sensor itself
A mix of 2402 and 2351, among the 4 that I am using for testing (2 in ZHA, 2 in Z2M).
Ok - that is very useful We have a nice mix of build dates spreading across people with slight battery problems (me), and people with serious problems (you), so I think we can safely remove hardware build variants as a possible root cause of battery drain. And we have already eliminated battery types (1.5V/1.2V) as a possible cause.
More assumptions/musings/recap ....
So .... assuming no hardware issues as per above, the devices that are draining batteries are probably not deep sleeping properly. Something must be keeping them awake (an awful lot in your cases) and hence the battery is draining very quickly (within 2 days or so ). The ultimate 17 secs rejoin loop might just be the sensor falling into a low voltage brownout loop i.e. batt is low, reboots, joins, ... batt is low, reboots joins .... etc etc until battery is completely exhausted and it dies
So, what stimulus could cause the sensor not to deep sleep properly?
z2m can't poll / ping a sleeping device (as it will be asleep), so for battery devices it just notices a lack of check-ins as a means of detecting Availability, hence it cannot be responsible for keeping the device awake.
My systems sensors rejoin every 10 mins with a Dongle-E (direct connection), but they don't with an indirect connection to the Dongle-E and don't with a Dongle-P. This really is not good at all, but it is not frequent enough to drain a battery in 2 days (but will effect my battery life quite significantly). However, this is a symptom of something not quite right; the sensor is not completely happy with my network in order for it to do this (I have never observed this with any other zigbee device)
But you have a much more severe problem than that, so am back to the old question - I do wonder why are your systems behaving so differently to mine, despite using a common adapter and same z2m version? i.e. what else could be influencing the sensors behaviour ...
Out of curiosity, could you possibly run grep -R "z2m: Accepting joining not in blocklist device" *
on your z2m logs ?
Don't really need to see the results - just a "none" or "lots every x mins" would be interesting
(@9shearer - as another update to a post 3 days ago about moving networks on one of my sensors, the Parasoll I transplanted to my main Dongle-P network is behaving itself perfectly - no unnecessary rejoining, and open/close events are fine)
Hey @bonzo-dog , I really like your train of thought. First and foremost - no such messages in my logs.
I would disagree though with the rejoin loop being caused by a low battery situation - if caught early enough and re-plugged, the respective sensors still show 40-60% battery. If that were the cause, then they wouldn't rejoin properly on re-plugging the battery.
Trying to consolidate my observations (and possibly others', maybe we can find some common element): into a spreadsheet: https://lite.framacalc.org/0mat14qdxb-a94v . Feel free to add, maybe this helps @Koenkk, @Nerivec and the rest of the z2m team.
From what I could gather, on zstack firmware, these devices can misbehave if paired directly to coordinator (suggestion is to pair to through a router). Can anyone confirm a behavior change after this (make sure it actually is on a router after connecting it)?
I double checked all three and they are all connected via router, two of them via IKEA LED1924G9 bulb and one of them (which I thought might be bound directly to the router) is connected via Tuya WZ-M100-W presence sensor.
(@9shearer - as another update to a post 3 days ago about moving networks on one of my sensors, the Parasoll I transplanted to my main Dongle-P network is behaving itself perfectly - no unnecessary rejoining, and open/close events are fine)
That's really odd.. on my Dongle-P I have the same misbehavior as everybody here on Dongle-E
Out of curiosity, could you possibly run
grep -R "z2m: Accepting joining not in blocklist device" *
on your z2m logs ?
EDIT:
Oddly for the sensors in question I have very few logs in terms of rejoining. Also this rejoin loop I saw in the UI is not in the logs - I'll keep a look if this changes the next days, since I have powered them off since this loop, because of my whole network dropoff I wrote earlier about.
Here are the logs including the timestamp
grep -R '0x048727fffe97ecf6' * | grep -i 'join'
2024-05-13.23-07-03/log.log:[2024-05-14 22:25:45] info: z2m: Device '0x048727fffe97ecf6' joined
2024-07-17.13-56-08/log.log:[2024-07-17 13:56:11] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:56:12] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:56:27] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:56:27] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:56:30] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:56:30] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:56:30] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:56:30] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:56:30] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:56:30] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:56:31] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:56:31] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:56:31] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:56:32] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:56:32] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:56:49] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:56:49] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:56:55] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:56:55] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:57:07] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:57:07] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:57:24] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:57:24] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:57:41] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:57:41] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:57:42] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:57:42] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:57:42] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:57:42] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:57:43] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:57:43] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:57:45] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:57:45] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:57:45] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:57:45] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:57:45] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:57:45] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:58:01] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:58:01] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:58:01] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:58:01] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:58:03] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:58:03] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:58:03] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:58:03] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:58:19] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:58:19] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:58:37] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:58:37] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:58:37] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:58:37] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:58:38] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-17.13-56-08/log.log:[2024-07-17 13:58:38] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-27.02-50-49/log.log:[2024-07-29 10:20:52] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
2024-07-27.02-50-49/log.log:[2024-07-29 10:20:52] info: z2m: Accepting joining not in blocklist device '0x048727fffe97ecf6'
I would disagree though with the rejoin loop being caused by a low battery situation
The one time I had the rejoin loop was indeed on a complete fresh battery. And I just checked the 1,2V IKEA batteries show ~1,35V after full charge and ~0,75V after two days in the sensors, so the drain is not only sensor/software wise but really there.
Trying to consolidate my observations (and possibly others', maybe we can find some common element): into a spreadsheet: https://lite.framacalc.org/0mat14qdxb-a94v .
Just added my observations, yet again - I seem to be the only one having issues with the Dongle-P
First and foremost - no such messages in my logs.
The sensors are conspiring to confuse me ... & it's working... Thanks for looking.
Have added my observations to the spreadsheet - great idea to consolidate it all in one place.
I would disagree though with the rejoin loop being caused by a low battery situation
A Bad guess of mine - @rollofdeath has just disproved that theory as well ..
rollofdeath
That's really odd.. on my Dongle-P I have the same misbehavior as everybody here on Dongle-E
I just don't get this difference ... there has got to be something obvious I am not seeing ....
And I just checked the 1,2V IKEA batteries show ~1,35V after full charge and ~0,75V after two days in the sensors
Just to add insult to injury, my sensors are 19 days old, and had freshly charged Ladda's installed, and have not been charged since. I just pulled one and measured it: 1.32V And the quiescent current draw when sleeping is 15µA, which is absolutely great. As such, depending of course on the number of open/close events & taking into account the 25mA draw when the radio is active, these things could/should potentially last a number of years ...
edit: yet more unsubstantiated ramblings .... Of course, if the RF part of the sensor got (almost) stuck on transmit @ 25mA, that would give around 30hrs runtime on 750mA Laddas.... Add a bit of transmit downtime to that (i.e. tx and wait for response ratio) and that could easily become 2 days .... Question is ... why would they do that? What could they be sending in a repeat loop, and not get a response to that they like?
@Koenkk @Nerivec The above is the first rambling theory of mine that might fit some real world observations. Have you seen anything like that before?
Looking at the small data sample that we have in the spreadsheet:
Am I missing or misinterpreting something?
I have added a 'user' column in the sheet so that we can track these easier, hope that's ok for everyone. I guess line 6 is from @RollOfDeath ?
I just don't get this difference ... there has got to be something obvious I am not seeing ....
Seems I am the odd one now with Dongle-P and this behavior
I have added a 'user' column in the sheet so that we can track these easier, hope that's ok for everyone. I guess line 6 is from @RollOfDeath ?
That's me, I added my name as well.
- wondering if the user who created line 6 could also test disabling availability checking for these devices?
I can disable it - can I do it for only the devices ? If so, can anybody point me where? I would leave one on availability check and the other two I'd disable it to see if the one drops off early.
@9shearer
Haven't looked for the OTA requests in the logs ... will do so and report back.
update to that:
The answer is no (sort-of), but I do have one flurry of 10 minute repeats of
z2m: Device 'Sensors/Contact/IKEA/Contact_Sensor_3' requested OTA
that lasted about 2 hours. Trouble is its a debug level report, and I really cannot be sure how much debug level logging I have had on, and at what times. Will redo that test (at the appropriate log level) and let it sit for a day and see what happens
I too could enable the Availability check on my systems just to be sure it has no effect (can't see how it would, but stranger things have been known), Again, will report back with outcome of that.
Am I missing or misinterpreting something?
If you are, I can't see it either :)
For reference, since Ikea should be using EmberZNet on their devices, here are a few relevant configuration files that would likely be included:
Of course, Ikea may have altered the values, or their behaviors entirely (possibly differently on different firmware builds too). (They obviously did something to scramble the OTA requesting...)
Seems to be a whole soup of "fixes" for various setups in ZHA: https://community.home-assistant.io/t/ikea-parasoll-vallhorn-devices-not-working-zha/668852
From what I could find, some also have a physical contact issue with the battery (as in, it's easy enough for the battery to lose contact, hence could trigger undesired reboots, assuming the contact is coming-and-going with open/close vibrations).
@9shearer @RollOfDeath
re availability enabling
What specific Availability settings have you used to date? Just the global enablement, or a device specific one? If the latter, can you post your device specific availability setting (want to replicate your settings as much as possible)
@9shearer @RollOfDeath
re availability enabling
What specific Availability settings have you used to date? Just the global enablement, or a device specific one? If the latter, can you post your device specific availability setting (want to replicate your settings as much as possible)
Looks like it's only possible to do it globally from the frontend. To enable / disable: gear symbol - Settings - Availability. In my production setup (which seems to work fine for many other devices, including Ikea ones): Availability (advanced) is selected, with 30 for Active devices, and 1500 for Passive devices.
It is also seemingly possible to control this more granularly (per-device) by editing the config file - enabling it globally, then disabling for specific devices. https://www.zigbee2mqtt.io/guide/configuration/device-availability.html
From what I could find, some also have a physical contact issue with the battery (as in, it's easy enough for the battery to lose contact, hence could trigger undesired reboots, assuming the contact is coming-and-going with open/close vibrations).
I added a small piece of two-sided tape on the interior of the battery compartment door, so that it presses the battery firmly into the location. Apart from movement vibrations, the plastic may also dilate / contract with the temperature, sunlight etc. That was my very first thought about the cause of the issue, wish it were that simple to solve. :)
@9shearer
Haven't looked for the OTA requests in the logs ... will do so and report back.
update to that:
The answer is no (sort-of), but I do have one flurry of 10 minute repeats of
z2m: Device 'Sensors/Contact/IKEA/Contact_Sensor_3' requested OTA
that lasted about 2 hours. Trouble is its a debug level report, and I really cannot be sure how much debug level logging I have had on, and at what times. Will redo that test (at the appropriate log level) and let it sit for a day and see what happens
That's it. On my (working) sensors, it happens consistently (5 requests every 10 minutes) for each of them (and yes, visible only with debug logging enabled). When it stops happening, it's a sign that the sensor has gone "bad". Interesting enough, the responses of "no OTA available" are quite seldom.
Was celebrating a bit too soon. One of my two sensors on the latest-dev instance has started misbehaving (roughly 27 hours since being plugged in). The lights still flash, but the state is not reported into z2m, and the rejoin loop is happening.
Something interesting that I noticed is that I can actually generate a rejoin attempt by opening/closing the sensor. In other words, it doesn't (only) happen automatically every 17 seconds, but also when a sensor event (open/close) happens. Also saw this in the log (it was there before, but I didn't notice it) - does it have any relevance?
zh:ember:ezsp: <=== [CBFRAME: ID=98:"undefined" Seq=41 Len=13]
zh:ember:ezsp: <=x= Ignored unused/unknown [CBFRAME: ID=98:"undefined" Seq=41 Len=13]
Finally, one more theory: the rejoin attempt appears to end at
z2m: Retrieving state of 'problem_sensor_B' after reconnect
Is it possible that z2m tries to read the state, this doesn't happen within the expected time (for whichever reason), which causes the sensor to believe it needs to rejoin again, and thus triggering the next iteration of the loop 15-17 seconds later?
@9shearer @RollOfDeath
I honestly think I might be loosing the plot now, because this makes no sense whatsoever.
You know my repeated bleeting about "z2m: Accepting joining not in blocklist device"
?
(and I have 4 days worth of logs of them ....)
Well, they have stopped.
A couple of possibilities:
So I turned Availability back off, and they didn't return.
I really can't believe it could have that effect (I am doubting what I think I am seeing), but I suppose the good news is that my system is now behaving much more like yours.
I have turned Availability back on, and I am going to leave it well alone for 2 days and see if the batteries go flat....
@9shearer
Also saw this in the log (it was there before, but I didn't notice it) - does it have any relevance?
It's a frame that's been removed in v8. Z2M never made use of it, so it never did anything.
Is it possible that z2m tries to read the state...
No, retrieval of the state (2 seconds after announce) is purely for Z2M, the sensor doesn't care if you do it or not (that is unless the device crashes when it receives the read for some reason...). Also, from what I saw in the various logs, the rejoin attempt is not rejected or anything, so, from Z2M's, but more importantly, the adapter's point of view, the sensor is on the network and OK after the first attempt.
but also when a sensor event (open/close) happens.
Sounds like the device is not keeping track of its parent when it goes to sleep, and is forced to rejoin when it wakes up (and somehow, that gets scrambled into multiple attempts). We'd need a sniff to confirm/know more though.
In my production setup (which seems to work fine for many other devices, including Ikea ones): Availability (advanced) is selected, with 30 for Active devices, and 1500 for Passive devices.
Exact same here. The default (I believe it was 10) for active devices was too noisy for some of my bulbs.
It is also seemingly possible to control this more granularly (per-device) by editing the config file - enabling it globally, then disabling for specific devices. https://www.zigbee2mqtt.io/guide/configuration/device-availability.html
That's cool! I've disabled for the noisy "rejoin" sensor and for the other one close to the coordinator. The one I left on availability true is 3 m away in a straight line from the noisy one so this would be the most comparable sensors in terms of mounting position and also both are bound to the same router.
No, retrieval of the state (2 seconds after announce) is purely for Z2M, the sensor doesn't care if you do it or not (that is unless the device crashes when it receives the read for some reason...). Also, from what I saw in the various logs, the rejoin attempt is not rejected or anything, so, from Z2M's, but more importantly, the adapter's point of view, the sensor is on the network and OK after the first attempt.
Well yes, but this doesn't seem to work. The sensor blinks (so I assume the state - if that means the open/closed state - is detected correctly 'internally'), but this never gets read to the z2m. Is this some form of Schrodinger's sensor? Meaning it's both on the network (from the coordinator viewpoint), and it's not (from its own perspective, hence the rejoins)?
but also when a sensor event (open/close) happens.
Sounds like the device is not keeping track of its parent when it goes to sleep, and is forced to rejoin when it wakes up (and somehow, that gets scrambled into multiple attempts). We'd need a sniff to confirm/know more though.
Question: in a message like the one below, does "parent = 0" mean the coordinator is the parent (which would be accurate), or does it mean there's no parent?
zh:ember:ezsp: ezspTrustCenterJoinHandler(): callback called with: [newNodeId=xxx], [newNodeEui64=yyyy], [status=STANDARD_SECURITY_SECURED_REJOIN], [policyDecision=NO_ACTION], [parentOfNewNodeId=0]
Update on the symptoms I noticed (I think I listed it a few tens of posts up, here's for the refresh):
Node ID zero (0x0000) is always the coordinator indeed, per ZigBee spec.
An invalid/absent node ID should show up as 65535 (0xffff). _That happens with that handler when the status is DEVICE_LEFT
for example._
@Nerivec suggested to test with a minimal converter, could you guys try with:
Could you check if the issue is fixed with the following external converter:
configuration.yaml
as ext_converter.js
configuration.yaml
:
external_converters:
- ext_converter.js
CUSTOM
(this indicates the external converter has been loaded correctly)@Nerivec @Koenkk
Thanks for minimal converter - have installed it and is under test here. It is interesting that you have dropped the clusters down to the bare minimum. As mine were again behaving a bit strangely last night, I was looking at database.db, and there were some odd values in there Singular example: For my 3 sensors, I had "checkinInterval" vales : of 3300, 3300 and 43200 I was wondering where those specific value(s) came from, and why a difference between 3 identical devices. They seemed unrelated to any config. values on the system. As such, I was musing to myself whether something was being pushed to the devices that was corrupting its config. and hence causing erratic behaviour (which could well explain my different behavioural observations to @9shearer and @RollOfDeath). Anyway, the database.db values now make sense after using the minimal converter and a reboot/repair, and the sensors are back to operating normally - will see how long that lasts. Will report back any strange behaviour
Done as requested @Koenkk , the sensors now show with a description of "... CUSTOM" in z2m.
First observation - this (kind of) fixes the 5x OTA request by disabling OTA (I think - seeing "supports_ota":false on initialization). Not a big loss, and less clutter in the logs. I now only see one entry of 'device ... requested OTA' followed shortly by a 'responded to OTA request... NO_IMAGE_AVAILABLE'.
One observation for my fellow testers ...... My definition of a test is starting with a known state of the devices 4 quick presses of the Parasoll button will force a rejoin (good) It does NOT however erase the previous clusters/attributes (will expand on this if anyone is interested) So .... if anyone knows how to properly reset a Parasoll, please let me know. :) (10 sec long press as per the manual does nothing for me) Otherwise, changing the pan_id/ext_pan id does work to force a clean start, but you don't want to be doing that on a "proper" system (unless you are happy rejoining all of your devices). My Parasolls were not "clean" - they are now, so I have started the test again ...
Just to expand on the above post The minimal_converter above will reduce the attributes reported, apparently as long as the Parasoll has been completely cleansed of the previous "settings" (what I would call a "clean" Parasoll). Specifically, the "battery" percentage attribute should no longer be reported.
Now, my 3 Parasolls that remained on my test network, despite being "reset" with the 4 button presses, and obviously using the minimal converter, continued to report the "battery" percentage attribute , namely
MQTT publish: topic 'zigbee2mqtt/Sensors/Contact/IKEA/Contact_Sensor_2', payload '{"ac_status":false,"battery":91,"battery_defect":false,"battery_low":false,"contact":false,"linkquality":224,"restore_reports":false,"supervision_reports":false,"tamper":false,"test":false,"trouble":false}'
This was not obeying the minimal_converter, and hence imho the Parasolls were not completely reset of the previous config.
However, my 4th Parasoll, that was temporarily on another network, when rejoined to my test network (ie pan_id change), immediately did as it should, namely:
MQTT publish: topic 'zigbee2mqtt/Sensors/Contact/IKEA/Contact_Sensor_3', payload '{"ac_status":false,"battery_defect":false,"battery_low":false,"contact":true,"linkquality":176,"restore_reports":false,"supervision_reports":false,"tamper":false,"test":false,"trouble":false}'
Note the correct lack of the "battery" percentage attribute, which the minimal_converter has removed
So, methinks my Parasolls, if remaining on the same network, even after a 4 button press "reset" , do not reset to an "out of the box", clean start when rejoining the same network, and hence who knows what might be still hanging around from any previous misbehaviour that could potentially effect a test...
@bonzo-dog I'm wondering if that's not a leftover value somewhere in z2m, i.e. whatever gets pushed to the MQTT is a 'union' of the two sets of values (what is "leftover" in z2m and/or - if newer - what gets reported from the sensor). My two sensors who are now undergoing testing report respectively 89% and 95% - with freshly-charged batteries inserted at the beginning of the test, ~36 hours ago. I believe these were the last values reported (indeed, by the same physical sensors, but with different batteries) before I applied the custom config.
That aside, both sensors are still working and reporting state changes properly, ~36 hours into the test. Not calling it a win yet, would need another 2-3 days to note a tangible difference from the previous behaviour.
re the battery percentage reporting, I really thought I had manually cleansed all traces of the sensors from my system to make that assumption (i.e. pointing my finger solely at the Parasoll), but I admit I cannot be 100% sure of that, so you might well be right there. My Parasolls are too quite happy at the moment - all my (albeit minor) problems have not reappeared, so will leave them on soak for a few more days ....
More or less on the 48h mark - both sensors are still working. We might be on to something.
You can probably try to re-enable each line (remove the //
in the external converter), one at a time, to figure out which one is the culprit.
I'd start with the last, since OTA appears messed up on that device...
You can also add a number or something after CUSTOM
in the description, to be sure the modified external converter was properly detected after restarting z2m and factory resetting/re-pairing the device.
60h into it - still no issues. I'll enable this in the production instance (1.39.0 stable rather than latest-dev) with a few more sensors.
When removing the comments (//
) one by one as suggested by @Nerivec, make sure to factory reset/re-pair the device after that.
Did it for 4 more sensors - re-pairing went (eventually) OK, although 3 out of 4 required several attempts (also happened before).
I tried uncommenting the battery line (as that's a useful thing to have), but seeing an error in z2m logs -
error: z2m: Failed to load external converter file 'ext_converter.js' (battery is not defined)
For now, it's still running with the default (linked above) ext_converter.js, and with 1.39.0 stable.
Replace the first line:
const { iasZoneAlarm } = require("zigbee-herdsman-converters/lib/modernExtend");
with these two:
const { ikeaOta, addCustomClusterManuSpecificIkeaUnknown } = require ("zigbee-herdsman-converters/lib/ikea");
const { iasZoneAlarm, battery, identify, bindCluster } = require("zigbee-herdsman-converters/lib/modernExtend");
Then you should be able to do any combination.
Hey guys, haven't added the external converter yet, but i want to also update you on my sensors. After disabling the availability check the drainage doesn't seem to occur (all sensors are stable at 92% right now). Also all sensors seem to work fine.
The one where I haven't disabled the availability check had one bad reading after ~5h (showed open when the door was closed), but other than that the two with disabled availability and the one with it enabled working so far without issues.
72+ hours since setting up the external converter - both sensors are working fine. :) I'm cautiously optimistic that this converter "isolated out" the issue.
Updated the external converter as suggested by @Nerivec above and enabled the battery reporting. Re-paired problem_sensor_B, the other one re-configured itself automatically on open/close. Both report the battery now as 92% (which I think is realistic, as they usually report 93-95% when a fresh battery is plugged in). If 1% battery lasts through 3 days, then I am super happy. I see from the spreadsheet that @bonzo-dog has just done exactly the same, so we now have two environments to test (one with 1.39.0 stable, the other with 1.39.0 dev).
In the meanwhile, I have applied this converter to my 'main' network and updated z2m there to 1.39.1 stable. In case anyone needs the current form (including battery % reporting), here it is - based on @Nerivec's guidance:
const { ikeaOta, addCustomClusterManuSpecificIkeaUnknown } = require ("zigbee-herdsman-converters/lib/ikea");
const { iasZoneAlarm, battery, identify, bindCluster } = require("zigbee-herdsman-converters/lib/modernExtend");
const definition = {
zigbeeModel: ["PARASOLL Door/Window Sensor"],
model: "E2013",
vendor: "IKEA",
description: "PARASOLL door/window sensor CUSTOM",
extend: [
// addCustomClusterManuSpecificIkeaUnknown(),
// bindCluster({cluster: 'genPollCtrl', clusterType: 'input'}),
iasZoneAlarm({ zoneType: "contact", zoneAttributes: ["alarm_1"] }),
// identify({isSleepy: true}),
battery(),
// ikeaOta(),
],
};
module.exports = definition;
Things appear to be generally fine, although I did see:
I'll give this a full week before confirming everything is working fine on the general setup. @Nerivec / @Koenkk as I understand it, the converter disables some of the clusters (SpecificIkea..., genPollCtrl, identify, OTA). Do you have a closer idea as to what is causing the underlying misbehaviour?
I'd guess the most likely ones to cause battery drain would be Ota and maybe genPollCtrl... addCustomClusterManuSpecificIkeaUnknown shouldn't do anything aside add a cluster definition. Also no harm in not having it it will just show 1 numeric cluster in the frontends development tab.
As mentioned, my guess is ikeaOta()
since this device appears to have a broken/non-standard handling of OTA requests (shouldn't be spamming like it does after it receives a "no image available"). One of you will have to add it back, to see if the device starts misbehaving again, then we'll know for sure.
My bets are also on bindCluster({cluster: 'genPollCtrl', clusterType: 'input'}),
causing the issue but let's see, good to see progress here! 😄
Just an update on my test system. Parasolls seem quite happy now. No obvious oddities in the logs. No abnormal battery drain. They do infrequently send 2 concurrent messages for an open/close event, but it is not every time, so it is probably just what they do, rather than a symptom of something wrong. (previously they were doing this every time, so they are much better behaved now) They "check in" about every 18 hours (with a repeat of the last status), which is also fine. I will have to move my system soon (was bought for usage elsewhere), however I will pick up another Parasoll in the next few days and experiment with it here to see if I can stimulate their bad behaviour again...
I know we are talking about “PARASOLL” specifically...
but BADRING Water Leakage Sensor and the VALLHORN Wireless Motion Sensor, all behaving the same way as described here.
All my Ikea devices have the latest firmware. Other Ikea devices that are mains or USB powered all work fine. (Have a total of 21). No battery charge or type has any effect. I am very confident that the battery charge or type is most certainly not the core issue. Even after the device reports 100% battery (installed a new one), and repaired to the coordinator, it still fell off within 24 hours.
Everything was working for over 37 days till I upgraded z2m to 1.39.1-1 last week.
I do not want to use a “converter” would like to stay with “out of the box settings” for now. I most certainly do not want to turn of check availability. At this stage I am in no hurry, till now just been testing devices and functionality. I have about 50 Zigbee devices on two floors on 2 XZG PoE Coordinators (Latest 20240707 Firmware and Zigbee 20240710).
Is there anything I can do to help while staying with the out of the box experience? Provide some logs perhaps? Or is the root cause well understood now?
As I said I am new to all this and not to talk smack I am liking Zigbee and no way I would be into Zigbee without z2m. But I am really surprised that an IEEE standardized technology that is 20 years old would have so many issues. It’s like Ethernet before 3com and CISCO...
I think we might have celebrated a bit too early. I have plugged all my Parasolls back to the main network a few days back. The situation has definitely improved with the custom converter, but not fully resolved. In these 4-5 days I found several (<10) situations where the sensor would either be fully unresponsive (not blinking on open/close) or not transmitting the state to z2m (blinking on open/close, but not sending the state changes into z2m). In both cases, removing/reinserting the battery solved the issue. To my recollection, this hasn't repeated for the same sensor, so maybe it's just one of those "reboot once" situations. Considering the number of sensors I have, and the prior behaviour, this is definitely an improvement (all sensors used to misbehave after 1-3 days), but it's still not 100% reliable.
Is there a way to extract the "connection history" of a particular EndDevice (i.e. what router was it connected to, since when and until when)? I am starting to suspect something else might be at play here (a bad/unreliable router or similar).
@ortofan to my very little knowledge, while Zigbee is standardised in IEEE 802.15.4, the standard offers considerable largesse to implementers and there is no "Zigbee certification" per se. This keeps the price down and availability high, but at the cost of these occasional tweaks / hacks needed to make everything work together (especially across different major manufacturers). I think zigbee2mqtt is doing a fantastic job as an "abstraction layer" which alleviates / eliminates most of these "implementation choices" made by manufacturers. While not perfect, it is the best tool we have in this space and it is also free. I have some BADRING sensors as well, but those seemed to stay reliably on the network for months. Will test to see if they actually do. :)
@9shearer
As with most of my posts, this isn't gong to be terribly helpful to you , but .. I transplanted my test system (NetB) to its off-site place 3 days ago, and my Parasolls there have been behaving themselves perfectly However, I did notice one quite strange thing that is similar to your problem, so thought it was worth mentioning. I have a spare Parasoll now that I keep here for experimentation. It was joined to my primary home network (NetA) i.e. not NetB As I wanted to do some coverage tests (my off-site system has some quite long distances to cover), I picked up spare Parasoll and took it with me, in order to allow me to walk around the site with spare Parasoll in hand seeing if the open/close worked and checking the LQI.
So, arrived to site, installed NetB, and NetB and "its" Parasolls just worked (no reason why it shouldn't of course)
Now, I obviously needed to reset spare Parasoll to get it on NetB for testing but ... What was quite odd was the fact that spare Parasoll (which was transported with battery in) was complete unresponsive. No light when opening/closing, and wouldn't respond to the Reset button presses. I really thought the battery was dead, but measured the voltage and it was fine. 20 mins later, after multiple battery removals and Reset presses, I managed to get it back to life, and it was fine after that.
However, I have now seen one completely unresponsive Parasoll as well, seemingly as a result of being left powered and taken away from its joined network for a few hours (yes, another assumption of mine). So maybe, under certain circumstances, they can lockup with network connectivity issues and not recover gracefully ....
Recently, I bought a new Parasoll and after adding it to the network for some reason it was not behaving the same as the one I already had. To be more precise, it did not work at all, the devices was not reporting anything. After some playing around I found that this is similar to issue #22184 (no water leak reporting from Badring). If I understand correctly what are the changes in https://github.com/Koenkk/zigbee-herdsman-converters/commit/c5b17c4bdd62e40c5442cb2c05db26c495f551f8#diff-dc7af00758eabcf6565a2ba07001f55682678d3cd88524080d5cbf3ccb954976L1320 this could be a regression from https://github.com/Koenkk/zigbee-herdsman-converters/pull/7220 as the binding of ssIasZone cluster is suddenly missing.
I used the following extra converter which made my device work from the box right after pairing:
const {deviceEndpoints, battery, identify, iasZoneAlarm, bindCluster} = require('zigbee-herdsman-converters/lib/modernExtend');
const {addCustomClusterManuSpecificIkeaUnknown, ikeaOta} = require('zigbee-herdsman-converters/lib/ikea');
const definition = {
zigbeeModel: ['PARASOLL Door/Window Sensor'],
model: 'E2013',
vendor: 'IKEA of Sweden',
description: 'PARASOLL door/window sensor',
extend: [
addCustomClusterManuSpecificIkeaUnknown(),
deviceEndpoints({"endpoints":{"1":1,"2":2}}),
bindCluster({cluster: 'ssIasZone', clusterType: 'input', endpointNames: ["2"]}),
iasZoneAlarm({zoneType: 'contact', zoneAttributes: ['alarm_1']}),
identify({isSleepy: true}),
battery(),
ikeaOta(),
],
};
module.exports = definition;
I do not have deep understanding, but https://github.com/Koenkk/zigbee-herdsman-converters/pull/7866 could probably help here.
@baierjan, does applying the new converter (instead of the basic one listed some posts up) require again resetting / re-pairing all sensors, or would it work out of the box (after a z2m restart, of course)?
I am not an expert here, since the converter is adding a custom binding which is afaik only done during the initial configuration after pairing, I would guess you still need to repair the device if you want the binding to be done automatically. I do not think you are forced to repair though. But maybe you can get away with the reconfigure option in z2m which can be initiated from the controller?
I'll give this a try and see how it goes. In any case, I think reconfiguring from the frontend still requires taking the sensor out of its socket and pressing the reset button once to "wake up", so not a huge difference. I'll try re-pairing everything.
Since I applied the basic converter mentioned above, I had a few episodes of sensors silently misbehaving (they still blink, but the state changes aren't reflected in z2m) and eventually dropping off the network altogether. Much fewer than before (say, ~10 in two weeks vs. ~10 in two days with the default configuration), but still not stable enough.
During my tests, I was able to wake the device by opening/closing the door with the sensor; still a manual action but a bit quicker than removing it from the socket.
I think something is still broken. Since applying the new converter, restarting z2m and re-pairing all my sensors (Sunday afternoon), I already had two sensors going missing within ~24h (Monday afternoon/evening). One was completely dead, the other one blinking, but not sending state to z2m. In both cases, removing and reinserting the battery solved the issue.
Well, finally, one of my sensors has exact same symptoms as yours, @9shearer The system I deployed elsewhere has been behaving itself pretty well so far (14 days with 4 Parasolls on a Dongle-E with the minimal converter of a couple of weeks ago). But.... today one sensor exhibited the "...one blinking, but not sending state to z2m..." problem. Battery removal and re-insert resumed normal operation.
I am starting to suspect I (and potentially others) might be facing a compound problem: Problem 1. Dodgy repeaters in the network, possibly causing (at least for a while) the "blinking, but not sending state to z2m" issue. Question 1: @bonzo-dog, are you using any repeaters in your network mentioned just above (reading the post I'd reckon not)? Question 2: is it possible to have an "orphan repeater" situation in a Zigbee network? More specifically, a device listed as a repeater/router, which somehow gets disconnected from the coordinator, but still has end devices pairing to it. What happens with the end devices in this case?
Problem 2: some issue with these sensors. I am starting to suspect the OTA feature even more.
My config at the moment: @baierjan's converter but with the Ota cluster commented out, and I have removed a set of smartplugs which were meant to function as repeaters. I realize this is probably not the best troubleshooting method ("change one thing, then test, then change another thing - never change multiple things at once"), but my patience with these devices is pretty worn out at this point. So far (~36 hours), everything seems to work fine.
What happened?
I've 14 IKEA Parasoll sensors connected to my zigbee network.
The sensors are going offline in Zigbee2MQTT after they should have checked in for the availability check.
My availability settings are set to advanced, 10 min timeout for active devices and 120 mins for passive.
The sensors all have new IKEA LADDA batteries which are the 1.2V type based on other known issues.
The same issue with going offline doesn't appear to happen with ZHA so the issue doesn't appear to be device related
What did you expect to happen?
No response
How to reproduce it (minimal and precise)
No response
Zigbee2MQTT version
1.37.0
Adapter firmware version
20221226
Adapter
SONFF Zigbee Dongle-P
Setup
Add-on within Home Assistant within Proxmox VM on Intel NUC
Debug log
log1.log
Example:
[2024-05-10 10:32:54] debug: z2m: Passive device 'Back Bedroom Right Window' was last seen '2.00' hours ago. [2024-05-10 10:32:54] debug: z2m: MQTT publish: topic 'zigbee2mqtt/Back Bedroom Right Window/availability', payload '{"state":"offline"}'