Closed deviantintegral closed 2 months ago
This would fix #20045 for z-stack adapters (but the issue is present in ember too so if someone wants to have a look at it, I can test it.)
I fixed that locally, and there's other differences in the fixtures I haven't investigated yet to see if they make sense or are actual regressions.
CC @Nerivec for the Ember part
Seems like something that should be moved higher up the chain, not duplicated in each adapter (it's a regular ZDO req)?
Seems like something that should be moved higher up the chain, not duplicated in each adapter (it's a regular ZDO req)?
So you're thinking of something like doing this at https://github.com/Koenkk/zigbee-herdsman/blob/1ff1dc77bd5637c05f2d8865cbbb850371c3a645/src/controller/model/endpoint.ts#L292 and perhaps exposing some new public methods on the individual adapter classes?
We're going to move the ZDO requests out of adapters, so the nwk address request would become a function of Device
and then somewhere in the logic, we'd call it to fix the scenario behind this issue.
Seems like something that would be served better as a reaction to an event, instead of forcing it during a failed request. Maybe when a message from an unknown device is received? Do you still have a log from the ZBMini when Z2M doesn't recognize it anymore? To see what it still tries to communicate to Z2M. Can the behavior be replicated identically (with any device) by manually changing a nwk address in the database, then starting Z2M, so the actual address doesn't match what Z2M knows?
@Nerivec if it's useful, with Vimar devices I can pretty much replicate the issue every time I power them off.
@saveriol interesting, could you provide the debug log?
See this on how to enable debug logging.
@Koenkk yes of course! Here's what I done:
2024-09-15 09:47:06 enabled debug log (log attached here: "log of poweroff") 2024-09-15 09:48:20 powered off all 14592.0 devices (they have "Luce" as prefix in the description) 2024-09-15 09:50:00 powered on all 14592.0 devices 2024-09-15 10:00:00 these devices are marked as offline in Z2M devices tab:
Luce bagno (0x9E45)
Luce specchio bagno (0x81F6)
Luce corridoio (0x33AA)
Luce terrazzo studio (0xA250)
Luce specchio bagno piccolo (0xFEC5)
Luce camera (0x2CAB)
Luce bagno piccolo (0x0E63)
Luce terrazzo sala (0x06F8)
Luce sala TV (0xE23B)
Luce studio (0x553E)
some other 14592.0 are back online after power on:
Luce terrazzo camera (0xB519)
Luce cucina (0x872F)
Luce sala divano (0xC3CE)
2024-09-15 10:04:00 I start the "recovery" procedure of going around the home and toggling devices and looking in the logs for "Data is from unknown device with address" so I can replace the IDs in the database
cp database.db database.db.backup.20240915
#stop Z2M at 10.15.00
#Luce bagno
sed --in-place 's/\"nwkAddr\":40517,/\"nwkAddr\":16349,/' database.db
#Luce specchio bagno
sed --in-place 's/\"nwkAddr\":33270,/\"nwkAddr\":17548,/' database.db
#Luce corridoio
sed --in-place 's/\"nwkAddr\":13226,/\"nwkAddr\":25327,/' database.db
#Luce terrazzo studio
sed --in-place 's/\"nwkAddr\":41552,/\"nwkAddr\":11746,/' database.db
#Luce specchio bagno piccolo
sed --in-place 's/\"nwkAddr\":65221,/\"nwkAddr\":26338,/' database.db
#Luce camera
sed --in-place 's/\"nwkAddr\":11435,/\"nwkAddr\":46119,/' database.db
#Luce bagno piccolo
sed --in-place 's/\"nwkAddr\":3683,/\"nwkAddr\":63743,/' database.db
#Luce terrazzo sala
sed --in-place 's/\"nwkAddr\":1784,/\"nwkAddr\":2131,/' database.db
#Luce sala TV
sed --in-place 's/\"nwkAddr\":57915,/\"nwkAddr\":35491,/' database.db
#Luce studio
sed --in-place 's/\"nwkAddr\":21822,/\"nwkAddr\":41487,/' database.db
#start Z2M at 10.15.35 (log attached here: "log after fix")
2024-09-15 10:16:46 all devices are back online. Downloading both logs log after fix.log log of poweroff.log
I think we should do a ZDO network address request by ieeeAddr here: https://github.com/Koenkk/zigbee-herdsman/blob/master/src/controller/controller.ts#L722, need to wait a bit for the ZDO refactor is done, as this request is currently not supported yet (https://github.com/Koenkk/zigbee-herdsman/pull/1187)
@Koenkk That's what I was thinking too, but there is the issue of close-by networks using default network config that trigger this codepath like crazy. Obviously, it's a "bad setup" example, but adding the ZDO request on top would add to the useless spamming. Been a bit busy to look into it in more details though. Like you said, the ZDO refactor should make this easier wherever it ends up being triggered, and will fix it for all adapters at once.
@Nerivec I was thinking to do it only once for every unknown networkAddress
It's not unknown as far as Z2M is concerned though, it's just no longer matching that of the network (no way to know unless we fail a request, then do the ZDO to see if the address changed). Also you need the EUI64 to do the ZDO (which is not available from everywhere in the code). It could be done more efficiently from availability (on offline, send ZDO to confirm if really offline), but that would require it to be enabled for devices with that issue... I'll run some tests once the ZDO refactor is over, see if I can force Z2M to falter like with that ZBMINI.
Excellent, thanks for integrating this into the other PR. I've switched over to the dev branch and will report if I see anything odd from the refactor.
This follows up from discussion at https://github.com/Koenkk/zigbee2mqtt/discussions/15874#discussioncomment-10542315.
With this PR, when my zbmini changes network addresses herdsman can fix it up. This will work well with availability checks. It only works in response to a command from zigbee2mqtt, and not if you trigger a message by flipping the switch itself. If anyone has ideas on making that work too, I'd love to hear them!
This fails two tests, so it's possible this actually introduces a bug. I'm going to run with this on my local network for a few days and see if anything odd happens.