dresden-elektronik / deconz-rest-plugin

deCONZ REST-API plugin to control ZigBee devices
BSD 3-Clause "New" or "Revised" License
1.88k stars 483 forks source link

deCONZ will not send commands to end device connected to RaspBee #4551

Open ebaauw opened 3 years ago

ebaauw commented 3 years ago

Describe the bug

I have a Eurotronic Spirit TRV connected directly to the RaspBee. The light on the node blinks green to indicate it's polling the RaspBee. Yet, deCONZ won't send any messages to the TRV, claiming the node queue is full. I tried power cycling the TRV (that usually remedies connection issues), resetting the RaspBee (using GCFFlasher), power cycling the RaspBee (by power cycling the Pi), but nothing seems to remedy the situation.

Steps to reproduce the behavior

It's a bit intermittent, but it seems to happen every time one of my TRVs selects the RaspBee as parent. This is on my second network, with only 8 Eurotronic Spirit TRVs, a FYRTUR blind, a TRÅDFRI open/close remote, and a TRÅDFRI repeater. The FYRTUR seems to prefer the RaspBee as parent, the others seem to prefer the TRÅDFRI repeater. But sometimes they switch parent.

The FYRTUR has been stable ever since I created the second network and moved the devices over. The TRVs are stable as long as they remain connected to the repeater. I've seen this issue with three or four different TRVs so far, every time they would connect to the RaspBee.

Interestingly, lastseen on the ZHAThermostat resource is updated; state.lastupdated isn't, suggesting the TRV isn't sending reports, or the RaspBee isn't forwarding them to deCONZ.

Expected behavior

Obviously, deCONZ and the TRV should communicate.

Screenshots

Screenshot 2021-03-09 at 22 44

Environment

deCONZ Logs

Mar 09 22:52:02 pi3 deCONZ[817]: 22:52:02:797 Delay APS request diff = 1
Mar 09 22:52:02 pi3 deCONZ[817]: 22:52:02:799 APS-DATA.request id: 38, addrmode: 0x02, addr: 0xb64b, profile: 0x0104, cluster: 0x0000, ep: 0x01 -> 0x01 queue: 2 len: 5 tx.options 0x04
Mar 09 22:52:02 pi3 deCONZ[817]: 22:52:02:800 Delay APS request diff = 4
Mar 09 22:52:02 pi3 deCONZ[817]: 22:52:02:801 Delay APS request diff = 5
Mar 09 22:52:02 pi3 deCONZ[817]: 22:52:02:802 APS-DATA.request id: 39, addrmode: 0x02, addr: 0xb64b, profile: 0x0104, cluster: 0x0000, ep: 0x01 -> 0x01 queue: 3 len: 5 tx.options 0x04
Mar 09 22:52:02 pi3 deCONZ[817]: 22:52:02:804 Delay APS request diff = 8
Mar 09 22:52:02 pi3 deCONZ[817]: 22:52:02:805 Delay APS request diff = 9
Mar 09 22:52:02 pi3 deCONZ[817]: 22:52:02:806 reject aps request to enddevice node queue is full (4)
Mar 09 22:52:02 pi3 deCONZ[817]: 22:52:02:807 reject aps request to enddevice node queue is full (4)
Mar 09 22:52:02 pi3 deCONZ[817]: 22:52:02:808 reject aps request to enddevice node queue is full (4)
Mar 09 22:52:02 pi3 deCONZ[817]: 22:52:02:809 reject aps request to enddevice node queue is full (4)
Mar 09 22:52:02 pi3 deCONZ[817]: 22:52:02:810 reject aps request to enddevice node queue is full (4)
Mar 09 22:52:02 pi3 deCONZ[817]: 22:52:02:811 reject aps request to enddevice node queue is full (4)
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:02:812 reject aps request to enddevice node queue is full (4)
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:02:814 reject aps request to enddevice node queue is full (4)
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:02:815 reject aps request to enddevice node queue is full (4)
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:02:816 reject aps request to enddevice node queue is full (4)
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:03:014 poll node 00:0d:6f:ff:fe:27:66:58-01
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:03:015 Poll light node Trådfri Repeater
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:03:094 Delay APS request diff = 298
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:03:095 Delay APS request diff = 299
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:03:096 Delay APS request diff = 300
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:03:412 Delay APS request diff = 56
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:03:413 Delay APS request diff = 57
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:03:633 enqueue event config/localtime for /config/
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:03:635 Delay APS request diff = 279
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:03:636 Delay APS request diff = 280
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:03:637 rule event /config/localtime: 22:52:02.633 -> 22:52:03.633 (1000ms)
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:03:827 poll node 00:0d:6f:ff:fe:9e:00:aa-01
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:03:934 Delay APS request diff = 51
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:04:159 Delay APS request diff = 276
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:04:355 poll node 00:0d:6f:ff:fe:27:66:58-01
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:04:357 Poll light node Trådfri Repeater
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:04:634 enqueue event config/localtime for /config/
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:04:639 rule event /config/localtime: 22:52:03.633 -> 22:52:04.634 (1001ms)
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:04:950 APS-DATA.request id: 61, addrmode: 0x03, addr: 0x000d6ffffe276658, profile: 0x0000, cluster: 0x0031, ep: 0x00 -> 0x00 queue: 4 len: 2 tx.options 0x04
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:05:007 APS-DATA.confirm id: 61, status: 0x00 SUCCESS
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:05:008 APS-DATA.confirm request id: 61 -> confirmed, timeout 2043192
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:05:044 APS-DATA.indication srcAddr: 0x9dc7, srcEp: 0x00 dstAddrMode: 2, profile: 0x0000, cluster: 0x8031, lqi: 255, rssi: -45
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:05:045   asdu: e0000a03031e0401ffff2e2100072b9201008d1500395e120002781e0401ffff2e2100062b9201008d1500eccb120002781e0401ffff2e2100a42e9201008d150088691200029c
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:05:046 APS-DATA.indication request id: 61 -> finished
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:05:047 APS-DATA.request id: 61 erase from queue
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:05:194 poll node 00:0d:6f:ff:fe:9e:00:aa-01
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:05:633 enqueue event config/localtime for /config/
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:05:635 rule event /config/localtime: 22:52:04.634 -> 22:52:05.633 (999ms)
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:05:717 poll node 00:0d:6f:ff:fe:27:66:58-01
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:05:717 Poll light node Trådfri Repeater
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:06:580 poll node 00:0d:6f:ff:fe:9e:00:aa-01
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:06:634 enqueue event config/localtime for /config/
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:06:636 rule event /config/localtime: 22:52:05.633 -> 22:52:06.634 (1001ms)
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:07:134 poll node 00:0d:6f:ff:fe:27:66:58-01
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:07:135 Poll light node Trådfri Repeater
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:07:633 enqueue event config/localtime for /config/
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:07:637 rule event /config/localtime: 22:52:06.634 -> 22:52:07.633 (999ms)
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:07:830 APS-DATA.request id: 77, addrmode: 0x03, addr: 0x000d6ffffe276658, profile: 0x0000, cluster: 0x0031, ep: 0x00 -> 0x00 queue: 4 len: 2 tx.options 0x04
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:08:024 poll node 00:0d:6f:ff:fe:9e:00:aa-01
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:08:088 APS-DATA.indication srcAddr: 0x9dc7, srcEp: 0x00 dstAddrMode: 2, profile: 0x0000, cluster: 0x8031, lqi: 255, rssi: -45
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:08:089   asdu: e4000a06031e0401ffff2e210090f4e3feff6f0d00e18e1200026d1e0401ffff2e210051d29201008d1500047e120002501e0401ffff2e2100862e9201008d15003a7f1200026f
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:08:091 APS-DATA.indication request id: 77 -> finished? not confirmed
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:08:104 APS-DATA.confirm id: 77, status: 0x00 SUCCESS
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:08:105 APS-DATA.confirm request id: 77 -> finished, timeout 8831544
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:08:150 aps request id: 77 finished, erase from queue
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:08:600 poll node 00:0d:6f:ff:fe:27:66:58-01
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:08:601 Poll light node Trådfri Repeater
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:08:634 Daylight now: nightStart, status: 230, daylight: 0, dark: 1
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:08:635 enqueue event config/localtime for /config/
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:08:637 rule event /config/localtime: 22:52:07.633 -> 22:52:08.635 (1002ms)
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:09:458 poll node 00:0d:6f:ff:fe:9e:00:aa-01
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:09:633 enqueue event config/localtime for /config/
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:09:636 rule event /config/localtime: 22:52:08.635 -> 22:52:09.633 (998ms)
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:10:035 poll node 00:0d:6f:ff:fe:27:66:58-01
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:10:036 Poll light node Trådfri Repeater
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:10:525 APS-DATA.confirm id: 36, status: 0xF0 TRANSACTION_EXPIRED
Mar 09 22:52:10 pi3 deCONZ[817]: 22:52:10:525 APS-DATA.confirm id: 36 status: transaction expired (9231328 seconds)
Mar 09 22:52:16 pi3 deCONZ[817]: 22:52:10:634 enqueue event config/localtime for /config/
Mar 09 22:52:16 pi3 deCONZ[817]: 22:52:10:636 rule event /config/localtime: 22:52:09.633 -> 22:52:10.634 (1001ms)
Mar 09 22:52:16 pi3 deCONZ[817]: 22:52:10:709 APS-DATA.request id: 93, addrmode: 0x03, addr: 0x000d6ffffe276658, profile: 0x0000, cluster: 0x0031, ep: 0x00 -> 0x00 queue: 3 len: 2 tx.options 0x04
Mar 09 22:52:16 pi3 deCONZ[817]: 22:52:10:798 APS-DATA.indication srcAddr: 0x9dc7, srcEp: 0x00 dstAddrMode: 2, profile: 0x0000, cluster: 0x8031, lqi: 255, rssi: -45
Mar 09 22:52:16 pi3 deCONZ[817]: 22:52:10:798   asdu: e7000a09011e0401ffff2e2100aa009efeff6f0d00fe7a12000230
Mar 09 22:52:16 pi3 deCONZ[817]: 22:52:10:799 APS-DATA.indication request id: 93 -> finished? not confirmed
Mar 09 22:52:16 pi3 deCONZ[817]: 22:52:10:799 void deCONZ::zmNode::setFetched(deCONZ::RequestId, bool) fetched item: 8, node: 0x9DC7
Mar 09 22:52:16 pi3 deCONZ[817]: 22:52:10:818 APS-DATA.confirm id: 93, status: 0x00 SUCCESS
Mar 09 22:52:16 pi3 deCONZ[817]: 22:52:10:819 APS-DATA.confirm request id: 93 -> finished, timeout 8831544
Mar 09 22:52:16 pi3 deCONZ[817]: 22:52:10:870 aps request id: 93 finished, erase from queue
Mar 09 22:52:16 pi3 deCONZ[817]: 22:52:10:924 poll node 00:0d:6f:ff:fe:9e:00:aa-01
Mar 09 22:52:16 pi3 deCONZ[817]: 22:52:11:084 APS-DATA.confirm id: 37, status: 0xF0 TRANSACTION_EXPIRED
Mar 09 22:52:16 pi3 deCONZ[817]: 22:52:11:085 APS-DATA.confirm id: 37 status: transaction expired (7798840 seconds)

Additional context

All TRVs are on firmware 18181120, dated 20190408.

I've replaced the batteries on the TRV gone MIA with professional-grade lithium AA batteries (1.7V at 100%). I'm half suspecting the consumer-grade IKEA AA batteries; they drop voltage pretty quickly (under 1.4V at 80%, according to my battery tester), and the TRVs don't seem to like that.

ebaauw commented 3 years ago

Looking at the sniffer, the TRV and the RaspBee seem to be in a Rejoin Request/Response loop.

The TRV continuously sends:

IEEE 802.15.4 Data, Dst: 0x0000, Src: 0xb64b
    Frame Control Field: 0x8861, Frame Type: Data, Acknowledge Request, PAN ID Compression, Destination Addressing Mode: Short/16-bit, Frame Version: IEEE Std 802.15.4-2003, Source Addressing Mode: Short/16-bit
    Sequence Number: 159
    Destination PAN: 0x476f
    Destination: 0x0000
    Source: 0xb64b
    [Extended Source: Jennic_00:01:92:2b:01 (00:15:8d:00:01:92:2b:01)]
    [Origin: 63]
    TI CC24xx-format metadata: FCS OK
ZigBee Network Layer Command, Dst: 0x0000, Src: 0xb64b
    Frame Control Field: 0x1a09, Frame Type: Command, Discover Route: Suppress, Security, Destination, Extended Source Command
    Destination: 0x0000
    Source: 0xb64b
    Radius: 1
    Sequence Number: 147
    Destination: dresden-_ff:ff:01:04:1e (00:21:2e:ff:ff:01:04:1e)
    Extended Source: Jennic_00:01:92:2b:01 (00:15:8d:00:01:92:2b:01)
    ZigBee Security Header
    Command Frame: Rejoin Request
        Command Identifier: Rejoin Request (0x06)
        Capability Information: 0x00

And the RaspBee responds:

IEEE 802.15.4 Data, Dst: 0xb64b, Src: 0x0000
    Frame Control Field: 0x8861, Frame Type: Data, Acknowledge Request, PAN ID Compression, Destination Addressing Mode: Short/16-bit, Frame Version: IEEE Std 802.15.4-2003, Source Addressing Mode: Short/16-bit
    Sequence Number: 14
    Destination PAN: 0x476f
    Destination: 0xb64b
    Source: 0x0000
    [Extended Source: dresden-_ff:ff:01:04:1e (00:21:2e:ff:ff:01:04:1e)]
    [Origin: 22]
    TI CC24xx-format metadata: FCS OK
ZigBee Network Layer Command, Dst: 0xb64b, Src: 0x0000
    Frame Control Field: 0x1a09, Frame Type: Command, Discover Route: Suppress, Security, Destination, Extended Source Command
    Destination: 0xb64b
    Source: 0x0000
    Radius: 1
    Sequence Number: 97
    Destination: dresden-_ff:ff:01:04:1e (00:21:2e:ff:ff:01:04:1e)
    Extended Source: dresden-_ff:ff:01:04:1e (00:21:2e:ff:ff:01:04:1e)
    ZigBee Security Header
    Command Frame: Rejoin Response
        Command Identifier: Rejoin Response (0x07)
        Address: 0x0000
        Status: Success (0x00)

It it supposed to respond with address 0x0000? Shouldn't that be the address of the end device?

ebaauw commented 3 years ago

And somehow, the TRV reselected the repeater as parent and seems to be working normally. I don't see anything causing that the sniffer log, but suddenly the RaspBee sends a bind request to the TRV (packet 2946), who answers with an ACK, and after that the TRV is polling the repeater.

Here's the log (filtered on MAC address of the TRV): TRV.pcapng.gz; please PM me for the network key.

manup commented 3 years ago

It it supposed to respond with address 0x0000? Shouldn't that be the address of the end device?

Command Frame: Rejoin Request
        Command Identifier: Rejoin Request (0x06)
        Capability Information: 0x00

I wonder why capabilities is 0x00. I think it should be 0x80, which means allocate short address as usual for end-devices. Is 0x00 also shown in the Node Descriptor MAC capabilities?

Mar 09 22:52:02 pi3 deCONZ[817]: 22:52:02:806 reject aps request to enddevice node queue is full (4)

This means that the outgoing queue already has 3 pending APS requests for which no confirm has been received yet. Once they run in the timeout they will be removed. I'm not sure where the case went wrong, either the TRV not sending the ACK which would results in the confirm in the firmware. But the green line suggests that the TRV is indeed registered correctly as end-device to the coordinator as parent.

ebaauw commented 3 years ago

I wonder why capabilities is 0x00. I think it should be 0x80, which means allocate short address as usual for end-devices.

Dunno. If you check the sniffer log, the RaspBee sends some responses with the end-device NWK address (packets 127, 129, 130), but the preceding request also has info 0x00.

Is 0x00 also shown in the Node Descriptor MAC capabilities?

No, that's 0x80.

Once they run in the timeout they will be removed

How long before that timeout occurs?

I'm not sure where the case went wrong, either the TRV not sending the ACK which would results in the confirm in the firmware

If would seem the APS layer on the TRV is dormant waiting for a particular Rejoin Response? As long as they're connected to the TRÅDFRI repeater, my Spirits are rock solid as well. Somehow, the repeater must do something different than the RaspBee when the Spirit tries to rejoin the network.

But the green line suggests that the TRV is indeed registered correctly as end-device to the coordinator as parent.

And it's also polling the coordinator (sending Data Request messages) in between the Rejoin Request messages.

What other info can I provide? Would it help to power-cycle (another) TRV and capture the rejoin dialogue with the TRÅDFRI repeater?

manup commented 3 years ago

Would it help to power-cycle (another) TRV and capture the rejoin dialogue with the TRÅDFRI repeater?

Yes this would be interesting for comparison.

I've compared the NWK Rejoin command from one of my other sniffer logs, here a end-device indicated the same 0x80 mac capabilities as shown in the Node Descriptor and Device Announce commands.

image

However the description for the Allocate Short Address flag in the Zigbee specification:

"This field will have a value of 1 in implementations of this specification, indicating that the joining device must be issued a 16-bit network address, except in the case where a device has self-selected its address while using the NWK rejoin command to join a network for the first time in a secure manner. In this case, it shall have a value of 0."

And the parent handling:

"If the nwkAddrAlloc attribute of the NIB does not have a value of 0x00, the allocate address sub-field of the capabilities information field of the rejoin request command frame payload may have a value of 0 indicating a self-assigned or pre-existing network address. In this case, as is the case with all NWK command frames, the 16-bit network address in the source address field of the NWK header, in combination with the 64-bit IEEE address from the source IEEE address field of the network header should be checked for address conflicts as described in section 3.6.1.9. If an address conflict is discovered, a new, and non-conflicting, address shall be chosen for the joining device and shall be placed in the network address field of command frame payload of the outgoing rejoin response command frame. Otherwise, the contents of the source address field of the incoming rejoin request command frame shall be placed in the network address field of the command frame payload of the outgoing rejoin response command frame."

If I understand this correctly the 0x00 capabilities is okay and the Response from the RaspBee firmware is wrong. I need to check what happens in the firmware sources.

ebaauw commented 3 years ago

I read that three times, and I’m still not sure if I understand it correctly. But basically, I read: when capability info is 0x00 (meaning the device is requesting to reuse a previously allocated NWK address), the address in the response should match the requested address.

Thanks for looking into this!

manup commented 3 years ago

The specification is always a fun read :) but yes that is also what I understand from it.

github-actions[bot] commented 3 years ago

As there has not been any response in 21 days, this issue has been automatically marked as stale. At OP: Please either close this issue or keep it active It will be closed in 7 days if no further activity occurs.

github-actions[bot] commented 3 years ago

As there has not been any response in 28 days, this issue will be closed. @ OP: If this issue is solved post what fixed it for you. If it is not solved, request to get this opened again.