Koenkk / zigbee-herdsman

A Node.js Zigbee library
MIT License
454 stars 278 forks source link

Emit nwk event on concentratorIndCb #992

Closed deviantintegral closed 1 month ago

deviantintegral commented 1 month ago

I ran into another situation where zigbee2mqtt had a stale network address, similar to https://github.com/Koenkk/zigbee2mqtt/discussions/15874, but with https://www.zigbee2mqtt.io/devices/43076.html#enbrighten-43076 instead of the Sonoff ZBMINI.

I think this was triggered when I had circuits off as I was installing a new relay. I flipped the breakers several times in the process, so many devices were restarting over a 1 hour period.

I turned on debug logging and sniffed traffic, and saw the switch sending updates that weren't making it up the stack to zigbee2mqtt. It would respond to group broadcasts.

https://github.com/Koenkk/zigbee-herdsman/blob/0a0621159d366d81d3ef8ba5da548104ad4dd3fc/src/controller/controller.ts#L632-L638 is where the messages were in herdsman, but at that point none of the data in the method had the IEEE address to check against.

As well, I could see the device sending many route request and link status packets which seemed much higher than normal. While I could see what looked to be On / Off events triggered at the switch in Wireshark, for some reason it wasn't able to decrypt them normally.

SCR-20240325-tndz

In Herdsman, I saw that there were many messages being sent with the command concentratorIndCb. That data contained both the IEEE and the short address, so it seemed like a good place to check for network address changes.

With this PR, the switch came back online shortly after starting zigbee2mqtt. The "Many-to-one" route requests stopped being spammed.

What I'm not sure is:

  1. I haven't found yet a clear reference on what concentratorIndCb is. So, I'm not sure if this is the right place to improve or fix this.
  2. If this is, we could perhaps collapse this else statement with the one above.
deviantintegral commented 1 month ago

Tests are failing due to decreasing coverage - no new failures introduced. I can update tests once I have confirmation the change itself is OK.

Koenkk commented 1 month ago

Nice finding! Once tests are updated this can be merged.

deviantintegral commented 1 month ago

I added some comments explaining why we're mixing and matching commands and events, and added a test. My original question about collapsing the if doesn't matter because the packet fields are different for the two commands.

Aside, this was the first time I'd encountered "concentrator" and was having trouble finding good upstream docs for it. I think this chat does a reasonably OK job of explaining it. I totally thought it was hallucinating the callback and that it was mixing up JavaScript, but that does appear to be the case. That gave me enough breadcrumbs to find some zstack code that referenced it.

https://chat.openai.com/share/2ac8ee11-31f4-48bc-9f6d-5fb3f3b3c874

Nerivec commented 1 month ago

@Koenkk Looking at this and the mentioned discussion, I'm wondering if this is the famous address conflict that you guys are seeing like this with zStack, except it has a more "generic" callback than ember and just reports as "a new route from concentrator"? According to the spec, a network address change is not really supposed to happen otherwise.

Network short addresses SHALL be chosen at random. The random address assigned SHALL conform to the NIST testing regimen described in reference [B10]. When a device joins the network using MAC association, its parent SHALL choose a random address that does not already appear in any entry in the parent’s NIB. Under stochastic addressing, once a device has been assigned an address, it has no reason to relinquish that address and SHOULD retain it unless it receives an indication that its address is in conflict with that of another device on the network. Furthermore, devices MAY self-assign random addresses under stochastic addressing and retain them, as in the case of joining a network using the rejoin command frame (see section 3.6.1.6.1.2). The Zigbee coordinator, which has no parent, SHALL always have the address 0x0000.

Note: That kind of problem should be entirely solvable when we implement the full ZDO spec, since we can then trigger a proper network address request by IEEE when a device gets in that state to figure out if it's really gone, or if it just changed network address and "we missed it".

Koenkk commented 1 month ago

when a device gets in that state to figure out if it's really gone, or if it just changed network address and "we missed it".

This is what zstack already does: https://github.com/Koenkk/zigbee-herdsman/blob/aac0732de2d3f0f88df5526adc54b54c607d4917/src/adapter/z-stack/adapter/zStackAdapter.ts#L473

Koenkk commented 1 month ago

Thanks!