dresden-elektronik / deconz-rest-plugin

deCONZ REST-API plugin to control ZigBee devices
BSD 3-Clause "New" or "Revised" License
1.88k stars 485 forks source link

0x26240500 routing issues? #806

Closed ebaauw closed 5 years ago

ebaauw commented 5 years ago

I fear there's something going wrong with the routing in 0x26240500. I'm seeing red nodes again, and my two innr RB 162 bulbs don't seem to receive any unicast messages. They do react to groupcast messages. They "come back" after power cycling them, but only for a brief period. I cannot even get the deCONZ GUI to read the Basic cluster. I would seem it's just this particular type of bulb that causes these issues. They don't show red, though.

They seem to react when pressing 0 in the deCONZ GUI:

Sep 25 22:33:04 pi2 deCONZ[6475]: 22:33:02:214 send NWK_Addr_req to node  0x00158D0000CA5F3B
Sep 25 22:33:04 pi2 deCONZ[6475]: 22:33:02:559 APS-DATA.indication srcAddr: 0x00158d0000ca5f3b, srcEp: 0x00 dstAddrMode: 2, profile: 0x0000, cluster: 0x8000, lqi: 255, rssi: -51
Sep 25 22:33:08 pi2 deCONZ[6475]: 22:33:07:256 APS-DATA.indication srcAddr: 0x00158d0000ca5f3b, srcEp: 0x00 dstAddrMode: 2, profile: 0x0000, cluster: 0x8000, lqi: 255, rssi: -51
Sep 25 22:33:15 pi2 deCONZ[6475]: 22:33:13:824 aps request id: 8 prf: 0x0000 cl: 0x0000 timeout (confirmed: 1) to 0x00158D0000CA5F3B (0xFFFD)

When power cycling the bulb:

Sep 25 22:39:50 pi2 deCONZ[6475]: 22:39:49:846 device announce 0x00158D0000CA5F3B (0x429A) mac capabilities 0x8E
Sep 25 22:39:57 pi2 deCONZ[6475]: 22:39:53:257 APS-DATA.request id: 240, addrmode: 0x03, addr: 0x00158d0000ca5f3b, profile: 0x0104, cluster: 0x0004, ep: 0x01 -> 0x01 queue: 1 len: 4 tx.options 0x00
Sep 25 22:39:58 pi2 deCONZ[6475]: 22:39:55:771 APS-DATA.request id: 252, addrmode: 0x03, addr: 0x00158d0000ca5f3b, profile: 0x0104, cluster: 0x0006, ep: 0x01 -> 0x01 queue: 1 len: 5 tx.options 0x00
Sep 25 22:39:58 pi2 deCONZ[6475]: 22:39:55:963 APS-DATA.request id: 253, addrmode: 0x03, addr: 0x00158d0000ca5f3b, profile: 0x0104, cluster: 0x0000, ep: 0x01 -> 0x01 queue: 1 len: 5 tx.options 0x00
Sep 25 22:39:58 pi2 deCONZ[6475]: 22:39:56:056 APS-DATA.request id: 2, addrmode: 0x03, addr: 0x00158d0000ca5f3b, profile: 0x0104, cluster: 0x0008, ep: 0x01 -> 0x01 queue: 2 len: 5 tx.options 0x00
Sep 25 22:40:15 pi2 deCONZ[6475]: 22:40:10:136 aps request id: 12 prf: 0x0000 cl: 0x0005 timeout (confirmed: 1) to 0x00158D0000CA5F3B (0x429A)
Sep 25 22:40:32 pi2 deCONZ[6475]: 22:40:28:832 aps request id: 136 prf: 0x0000 cl: 0x0005 timeout (confirmed: 1) to 0x00158D0000CA5F3B (0x429A)
Sep 25 22:40:57 pi2 deCONZ[6475]: 22:40:51:040 aps request id: 28 prf: 0x0000 cl: 0x0005 timeout (confirmed: 1) to 0x00158D0000CA5F3B (0x429A)
Sep 25 22:41:08 pi2 deCONZ[6475]: 22:41:07:856 APS-DATA.request id: 218, addrmode: 0x03, addr: 0x00158d0000ca5f3b, profile: 0x0104, cluster: 0x0006, ep: 0x01 -> 0x01 queue: 4 len: 5 tx.options 0x00
Sep 25 22:41:18 pi2 deCONZ[6475]: 22:41:08:056 APS-DATA.request id: 220, addrmode: 0x03, addr: 0x00158d0000ca5f3b, profile: 0x0104, cluster: 0x0000, ep: 0x01 -> 0x01 queue: 4 len: 5 tx.options 0x00
Sep 25 22:41:18 pi2 deCONZ[6475]: 22:41:09:824 aps request id: 157 prf: 0x0000 cl: 0x0005 timeout (confirmed: 1) to 0x00158D0000CA5F3B (0x429A)
Sep 25 22:41:29 pi2 deCONZ[6475]: 22:41:27:656 aps request id: 13 prf: 0x0000 cl: 0x0005 timeout (confirmed: 1) to 0x00158D0000CA5F3B (0x429A)
Sep 25 22:41:51 pi2 deCONZ[6475]: 22:41:47:856 aps request id: 130 prf: 0x0000 cl: 0x0005 timeout (confirmed: 1) to 0x00158D0000CA5F3B (0x429A)
Sep 25 22:41:59 pi2 deCONZ[6475]: 22:41:56:544 APS-DATA.indication srcAddr: 0x00158d0000ca5f3b, srcEp: 0x00 dstAddrMode: 2, profile: 0x0000, cluster: 0x8005, lqi: 239, rssi: -61
Sep 25 22:41:59 pi2 deCONZ[6475]: 22:41:57:090 APS-DATA.indication srcAddr: 0x00158d0000ca5f3b, srcEp: 0x00 dstAddrMode: 2, profile: 0x0000, cluster: 0x8000, lqi: 239, rssi: -61

When sending On command from the deCONZ GUI:

Sep 25 22:46:48 pi2 deCONZ[6475]: 22:46:47:205 APS-DATA.indication srcAddr: 0x00158d0000ca5f3b, srcEp: 0x00 dstAddrMode: 2, profile: 0x0000, cluster: 0x8000, lqi: 239, rssi: -61
Sep 25 22:50:33 pi2 deCONZ[6475]: 22:50:29:693 APS-DATA.indication srcAddr: 0x00158d0000ca5f3b, srcEp: 0x00 dstAddrMode: 2, profile: 0x0000, cluster: 0x8000, lqi: 191, rssi: -67
Sep 25 22:50:41 pi2 deCONZ[6475]: 22:50:34:465 APS-DATA.indication srcAddr: 0x00158d0000ca5f3b, srcEp: 0x00 dstAddrMode: 2, profile: 0x0000, cluster: 0x8000, lqi: 191, rssi: -67
Sep 25 22:53:07 pi2 deCONZ[6475]: 22:53:05:778 APS-DATA.indication srcAddr: 0x00158d0000ca5f3b, srcEp: 0x00 dstAddrMode: 2, profile: 0x0000, cluster: 0x8000, lqi: 191, rssi: -67
Sep 25 22:53:17 pi2 deCONZ[6475]: 22:53:10:588 APS-DATA.indication srcAddr: 0x00158d0000ca5f3b, srcEp: 0x00 dstAddrMode: 2, profile: 0x0000, cluster: 0x8000, lqi: 191, rssi: -67
Sep 25 22:55:22 pi2 deCONZ[6475]: 22:55:10:178 APS-DATA.request id: 239, addrmode: 0x03, addr: 0x00158d0000ca5f3b, profile: 0x0104, cluster: 0x0006, ep: 0x01 -> 0x01 queue: 2 len: 3 tx.options 0x04
Sep 25 22:55:22 pi2 deCONZ[6475]: 22:55:14:877 0x00158D0000CA5F3B error APSDE-DATA.confirm: 0xA7 on task
Sep 25 22:55:22 pi2 deCONZ[6475]: 22:55:20:744 APS-DATA.request id: 28, addrmode: 0x03, addr: 0x00158d0000ca5f3b, profile: 0x0104, cluster: 0x0006, ep: 0x01 -> 0x01 queue: 2 len: 3 tx.options 0x04
Sep 25 22:55:22 pi2 deCONZ[6475]: 22:55:21:610 delay sending request 37 dt 1 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:22 pi2 deCONZ[6475]: 22:55:21:656 delay sending request 37 dt 1 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:22 pi2 deCONZ[6475]: 22:55:21:757 delay sending request 37 dt 1 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:22 pi2 deCONZ[6475]: 22:55:21:856 delay sending request 37 dt 1 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:22 pi2 deCONZ[6475]: 22:55:21:956 delay sending request 37 dt 1 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:22 pi2 deCONZ[6475]: 22:55:22:056 delay sending request 37 dt 1 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:22 pi2 deCONZ[6475]: 22:55:22:156 delay sending request 37 dt 1 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:22:257 delay sending request 37 dt 2 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:22:356 delay sending request 37 dt 2 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:22:464 delay sending request 37 dt 2 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:22:556 delay sending request 37 dt 2 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:22:614 delay sending request 45 dt 2 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:22:656 delay sending request 45 dt 2 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:22:759 delay sending request 45 dt 2 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:22:856 delay sending request 45 dt 2 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:22:956 delay sending request 45 dt 2 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:23:056 delay sending request 45 dt 2 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:23:156 delay sending request 45 dt 2 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:23:256 delay sending request 45 dt 3 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:23:356 delay sending request 45 dt 3 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:23:456 delay sending request 45 dt 3 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:23:556 delay sending request 45 dt 3 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:23:655 delay sending request 45 dt 3 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:23:756 delay sending request 45 dt 3 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:23:856 delay sending request 45 dt 3 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:23:956 delay sending request 45 dt 3 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:24:056 delay sending request 45 dt 3 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:24:156 delay sending request 45 dt 3 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:24:260 delay sending request 45 dt 4 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:24:355 delay sending request 45 dt 4 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:24:456 delay sending request 45 dt 4 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:24:556 delay sending request 45 dt 4 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:24:656 delay sending request 45 dt 4 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:24 pi2 deCONZ[6475]: 22:55:24:756 delay sending request 45 dt 4 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:35 pi2 deCONZ[6475]: 22:55:24:855 delay sending request 45 dt 4 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:35 pi2 deCONZ[6475]: 22:55:24:956 delay sending request 45 dt 4 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:35 pi2 deCONZ[6475]: 22:55:25:086 delay sending request 45 dt 4 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:35 pi2 deCONZ[6475]: 22:55:25:156 delay sending request 45 dt 4 ms to 0x00158D0000CA5F3B, cluster 0x0006
Sep 25 22:55:35 pi2 deCONZ[6475]: 22:55:25:256 APS-DATA.request id: 45, addrmode: 0x03, addr: 0x00158d0000ca5f3b, profile: 0x0104, cluster: 0x0006, ep: 0x01 -> 0x01 queue: 3 len: 3 tx.options 0x04
Sep 25 22:55:35 pi2 deCONZ[6475]: 22:55:25:295 0x00158D0000CA5F3B error APSDE-DATA.confirm: 0xA7 on task
Sep 25 22:55:35 pi2 deCONZ[6475]: 22:55:29:677 0x00158D0000CA5F3B error APSDE-DATA.confirm: 0xA7 on task
manup commented 5 years ago

Do you have the chance to sniff the traffic, it would be interesting what happens on air.

0x26240500 doesn't use the optimization of building routes from incoming commands (last hop) anymore, since this is broken in many cases. Instead it relies on proper route request and replies.

0xA7 suggests that unicast was send out but couldn't come through, 0xD0 might also occur which would mean no route.

Pressing 0 does send a NWK address request broadcast and the selected node can respond by unicast to that.

ebaauw commented 5 years ago

Sorry, I just restarted deCONZ (without resetting/power cycling the RaspBee, nor the bulbs). It took a couple of minutes, but the bulbs seem to have come back, as did the red node (an innr plug) that wouldn't revive on 0. Both bulbs are pretty close to the RaspBee, and during startup, they immediately showed as direct neighbours of the RaspBee. They have kept their NWK addresses.

I'll monitor the situation and try and do some sniffing should it occur again.

manup commented 5 years ago

Hmm something is strange in the above debug prints are no APS-DATA.confirm id: %u, status: 0x%02X %s\n messages. Which should appear shortly after 0x00158D0000CA5F3B error APSDE-DATA.confirm: 0xA7 on task.

ebaauw commented 5 years ago

Happened again, quickly enough. Attached the wireshark log, filtered on one of the bulb's NWK address: innr.pcapng.gz

Hmm something is strange in the above debug prints are no APS-DATA.confirm

My bad, I filtered (grep -e) on mac address or nwk address of the bulb. There were an APS-DATA.confirm messages. Here's the full log for a 10-min interval: log.gz

manup commented 5 years ago

:) ha totally forgot I can't read the packets since they are encrypted. Not sure if it is possible to safe only the decrypted packets with Wireshark.

From the log it looks like the packets are transmitted (confirm success 0x00) to the bulb but only ack on mac layer is requests (txOptions = 0x00). So a route request due a missing aps ack won't happen.

For testing purpose you may activate aps ack for the requests in the readAttributes() method.

ebaauw commented 5 years ago

Not sure if it is possible to safe only the decrypted packets with Wireshark.

Not sure how to save the decrypted packages as pcap (or if that's even possible). Export to text saves the unencrypted packages, but I'm not sure if that helps you analysing. innr.txt.gz

For testing purpose you may activate aps ack for the requests in the readAttributes() method.

You mean in the source code? Or in the deCONZ configuration? Do you need a new capture with that setting?

manup commented 5 years ago

Not sure how to save the decrypted packages as pcap (or if that's even possible). Export to text saves the unencrypted packages, but I'm not sure if that helps you analysing. innr.txt.gz

Thanks that works. The log looks ok, from the gateway side ZCL read attribute requests are send directly to the target (no routing). But for some reason it won't answer.

The only reply comes from the NWK address request broadcasts.

You mean in the source code? Or in the deCONZ configuration? Do you need a new capture with that setting?

In the plugin source code. The logs and captures should then show some route requests from the gateway, after failed confirms with 0xD0.

You may also try pressing key L on the keyboard when the lights is selected, this will send a ZDP Leave request to the node with rejoin enabled, should be like a power-cycle.

ebaauw commented 5 years ago

OK, uncommented https://github.com/dresden-elektronik/deconz-rest-plugin/blob/63bc05a883220f3a19fd415901f1510bdcda30ba/de_web_plugin.cpp#L6623

and installed the plugin. Looks like the bit is set allright:

Frame 1161: 126 bytes on wire (1008 bits), 126 bytes captured (1008 bits) on interface 0
Ethernet II, Src: Raspberr_3b:e8:df (b8:27:eb:3b:e8:df), Dst: Apple_ca:3b:53 (a4:5e:60:ca:3b:53)
Internet Protocol Version 4, Src: 192.168.76.81, Dst: 192.168.76.41
User Datagram Protocol, Src Port: 51497, Dst Port: 17754
ZigBee Encapsulation Protocol, Channel: 25, Length: 52
IEEE 802.15.4 Data, Dst: 0x429a, Src: 0x0000
ZigBee Network Layer Data, Dst: 0x429a, Src: 0x0000
    Frame Control Field: 0x0248, Frame Type: Data, Discover Route: Enable, Security Data
    Destination: 0x429a
    Source: 0x0000
    Radius: 10
    Sequence Number: 126
    [Extended Source: Dresden-_ff:ff:01:02:56 (00:21:2e:ff:ff:01:02:56)]
    [Origin: 23]
    ZigBee Security Header
ZigBee Application Support Layer Data, Dst Endpt: 1, Src Endpt: 1
    Frame Control Field: Data (0x40)
        .... ..00 = Frame Type: Data (0x0)
        .... 00.. = Delivery Mode: Unicast (0x0)
        ..0. .... = Security: False
        .1.. .... = Acknowledgement Request: True
        0... .... = Extended Header: False
    Destination Endpoint: 1
    Cluster: Basic (0x0000)
    Profile: Home Automation (0x0104)
    Source Endpoint: 1
    Counter: 139
ZigBee Cluster Library Frame, Command: Read Attributes, Seq: 48
    Frame Control Field: Profile-wide (0x10)
        .... ..00 = Frame Type: Profile-wide (0x0)
        .... .0.. = Manufacturer Specific: False
        .... 0... = Direction: Server
        ...1 .... = Disable Default Response: True
    Sequence Number: 48
    Command: Read Attributes (0x00)
    Attribute: Unknown (0x0032)
    Attribute: Unknown (0x0033)

But no answer, no route request, and as far as I can tell, no changes from without the ACK request set. I do wonder why the GUI reads the Hue-specific attributes? They're not shown in the Cluster info panel.

The (unicast) Leave request doesn't seem to reach the bulb.

Here's the dump: innr2.txt.gz

Check out frame 27112: all of a sudden the bulb reacts. Note that the reponses to the Read attribute commands take three hops. It also looks like I don't seem to capture all the ACK hops.

In frame 28927 the bulb forwards a route reply from the RaspBee to another node. Then it issues a route request for the RaspBee and receives (?) a response. Then it sends "no route" to the other node and goes incommunicado.

manup commented 5 years ago

But no answer, no route request, and as far as I can tell, no changes from without the ACK request set.

The route request from RaspBee should start only after a few attemps (albeit I'm not sure how it's handled in case it's a direct neighbor).

I do wonder why the GUI reads the Hue-specific attributes? They're not shown in the Cluster info panel.

Good catch that's a new bug, it should only read the manufacturer specific attributes when manufacturer matches, I'll fix that for the next version.

In frame 28927 the bulb forwards a route reply from the RaspBee to another node. Then it issues a route request for the RaspBee and receives (?) a response. Then it sends "no route" to the other node and goes incommunicado.

By comparing 29024 and 29079 it seems that the route request ID send from the light isn't incremented and stays 126, it should different for each new round.

The three-hop routing itself looks ok based on the path cost returned in the re-broadcasts. However I can't understand why it claims no route available.

Possibly the many-to-one route broadcast of the former firmware versions mitigated the issue.

manup commented 5 years ago

Possibly the many-to-one route broadcast of the former firmware versions mitigated the issue.

Correction Just checked my own sniffer, many-to-one broadcast is still active, so this shouldn't be the cause.

image

ebaauw commented 5 years ago

Possibly the many-to-one route broadcast of the former firmware versions mitigated the issue.

I've had these innr bulbs for a long time (before the firmware was first changed to handle the IKEA meshing) and I don't recall having these issues before.

Also, I do get red nodes for other lights (mostly the innr plugs and the ubisys dimmer). They seem to autocorrect (except, of course, for the ubisys, which had to be power cycled).

manup commented 5 years ago

Albeit might be a thing to try I've noticed that recent Philips Hue bridge firmware also supports many-to-one broadcast but has different flags for the Many-to-One-Discovery field.

manup commented 5 years ago

I've had these innr bulbs for a long time (before the firmware was first changed to handle the IKEA meshing) and I don't recall having these issues before.

Also, I do get red nodes for other lights (mostly the innr plugs and the ubisys dimmer). They seem to autocorrect (except, of course, for the ubisys, which had to be power cycled).

Hard to tell what's the cause, at least route requests and replies from the gateway look correct (only the not changing route ID is somewhat wrong).

Do you see any Many-to-One broadcasts from the gateway as shown above in your sniffer, they aren't visible in the logs.

ebaauw commented 5 years ago

Do you have enough info from me, or shall I re-assign one of the bulbs to my test network? I'm happy to share that network key, so you can read the pcap file. I could also pair the bulb with my Hue bridge if that's helpful.

manup commented 5 years ago

If the same issues happen in the test network yes that would be helpful.

manup commented 5 years ago

I'm getting some innr bulbs later today hopefully I can reproduce and investigate the issues here.

Gnaget2 commented 5 years ago

I have the same problem with 3 xiaomi devices after upgrading to 39 yesterday. 2 PIR and one door/windows magnet.

they seems to work but after a while they show as Not reachable in deconz, but if i trigger one of them they update status ans are shon as online for a while, then go back to Not reachable

ebaauw commented 5 years ago

Just to double-check, I reverted to 2.05.38 and 0x26230500. The innr bulbs work just fine with those versions.

I'm getting some innr bulbs later today hopefully I can reproduce and investigate the issues here.

It's this model that's causing the issues: screenshot 2018-09-27 at 18 32

ebaauw commented 5 years ago

If the same issues happen in the test network yes that would be helpful.

Unfortunately (?), the innr bulbs seem happy enough in my test network, with only the two bulbs and an innr smart plug.

manup commented 5 years ago

My network now contains

In total 45 nodes, hopefully enough to trigger the error.

Since your test network don't show the issues it might be related to network size and received broadcasts like route requests.

manup commented 5 years ago

So far no issues here, all 3 lights are normally reachable and can be controlled.

ebaauw commented 5 years ago

Moved the bulbs back to my production network and again upgraded that to v2.05.39 and 0x26240500. After starting deCONZ, the bulbs were fine for a brief period, and then again no longer seem to receive unicast messages from the RaspBee.

I paired one of my XBees to the production network. I am able to send a unicast message from the XBee to the bulb, and the bulb responds correctly:

Frame 10710: 138 bytes on wire (1104 bits), 138 bytes captured (1104 bits) on interface 0
Ethernet II, Src: Raspberr_3b:e8:df (b8:27:eb:3b:e8:df), Dst: Apple_ca:3b:53 (a4:5e:60:ca:3b:53)
Internet Protocol Version 4, Src: 192.168.76.81, Dst: 192.168.76.41
User Datagram Protocol, Src Port: 39025, Dst Port: 17754
ZigBee Encapsulation Protocol, Channel: 25, Length: 64
IEEE 802.15.4 Data, Dst: 0x2798, Src: 0xe597
    Frame Control Field: 0x8861, Frame Type: Data, Acknowledge Request, PAN ID Compression, Destination Addressing Mode: Short/16-bit, Frame Version: IEEE Std 802.15.4-2003, Source Addressing Mode: Short/16-bit
    Sequence Number: 239
    Destination PAN: 0xe8dd
    Destination: 0x2798
    Source: 0xe597
    [Extended Source: Jennic_00:02:17:87:ed (00:15:8d:00:02:17:87:ed)]
    [Origin: 99]
    Frame Check Sequence (TI CC24xx format): FCS OK
ZigBee Network Layer Data, Dst: 0x2798, Src: 0x3637
    Frame Control Field: 0x1a48, Frame Type: Data, Discover Route: Enable, Security, Destination, Extended Source Data
    Destination: 0x2798
    Source: 0x3637
    Radius: 29
    Sequence Number: 112
    Destination: Jennic_00:00:ca:5f:3b (00:15:8d:00:00:ca:5f:3b)
    Extended Source: Maxstrea_00:41:84:9d:a6 (00:13:a2:00:41:84:9d:a6)
    ZigBee Security Header
ZigBee Application Support Layer Data, Dst Endpt: 1, Src Endpt: 1
    Frame Control Field: Data (0x40)
    Destination Endpoint: 1
    Cluster: On/Off (0x0006)
    Profile: Home Automation (0x0104)
    Source Endpoint: 1
    Counter: 247
ZigBee Cluster Library Frame
    Frame Control Field: Cluster-specific (0x01)
    Sequence Number: 240
    Command: Toggle (0x02)
Frame 10716: 124 bytes on wire (992 bits), 124 bytes captured (992 bits) on interface 0
Ethernet II, Src: Raspberr_3b:e8:df (b8:27:eb:3b:e8:df), Dst: Apple_ca:3b:53 (a4:5e:60:ca:3b:53)
Internet Protocol Version 4, Src: 192.168.76.81, Dst: 192.168.76.41
User Datagram Protocol, Src Port: 39025, Dst Port: 17754
ZigBee Encapsulation Protocol, Channel: 25, Length: 50
IEEE 802.15.4 Data, Dst: 0x0f38, Src: 0x2798
    Frame Control Field: 0x8861, Frame Type: Data, Acknowledge Request, PAN ID Compression, Destination Addressing Mode: Short/16-bit, Frame Version: IEEE Std 802.15.4-2003, Source Addressing Mode: Short/16-bit
    Sequence Number: 175
    Destination PAN: 0xe8dd
    Destination: 0x0f38
    Source: 0x2798
    [Extended Source: Jennic_00:00:ca:5f:3b (00:15:8d:00:00:ca:5f:3b)]
    [Origin: 134]
    Frame Check Sequence (TI CC24xx format): FCS OK
ZigBee Network Layer Data, Dst: 0x3637, Src: 0x2798
    Frame Control Field: 0x0248, Frame Type: Data, Discover Route: Enable, Security Data
    Destination: 0x3637
    Source: 0x2798
    Radius: 30
    Sequence Number: 45
    [Extended Source: Jennic_00:00:ca:5f:3b (00:15:8d:00:00:ca:5f:3b)]
    [Origin: 134]
    ZigBee Security Header
ZigBee Application Support Layer Data, Dst Endpt: 1, Src Endpt: 1
    Frame Control Field: Data (0x00)
    Destination Endpoint: 1
    Cluster: On/Off (0x0006)
    Profile: Home Automation (0x0104)
    Source Endpoint: 1
    Counter: 226
ZigBee Cluster Library Frame, Command: Default Response, Seq: 240
    Frame Control Field: Profile-wide (0x18)
    Sequence Number: 240
    Command: Default Response (0x0b)
    Command: Toggle (0x02)
    Status: Success (0x00)
manup commented 5 years ago

After starting deCONZ, the bulbs were fine for a brief period, and then again no longer seem to receive unicast messages from the RaspBee.

Are the unicasts not send by RaspBee or are they ignored?

If they ignored, do you see any difference in the commands compared to the ones send by XBee?

ebaauw commented 5 years ago

The unicasts are sent by the RaspBee alright (I can see them in the sniffer), but they either don't reach the bulbs or are discarded by the bulbs (the light doesn't toggle):

Frame 879: 122 bytes on wire (976 bits), 122 bytes captured (976 bits) on interface 0
Ethernet II, Src: Raspberr_3b:e8:df (b8:27:eb:3b:e8:df), Dst: Apple_ca:3b:53 (a4:5e:60:ca:3b:53)
Internet Protocol Version 4, Src: 192.168.76.81, Dst: 192.168.76.41
User Datagram Protocol, Src Port: 58126, Dst Port: 17754
ZigBee Encapsulation Protocol, Channel: 25, Length: 48
IEEE 802.15.4 Data, Dst: 0x2798, Src: 0x0000
    Frame Control Field: 0x8861, Frame Type: Data, Acknowledge Request, PAN ID Compression, Destination Addressing Mode: Short/16-bit, Frame Version: IEEE Std 802.15.4-2003, Source Addressing Mode: Short/16-bit
    Sequence Number: 254
    Destination PAN: 0xe8dd
    Destination: 0x2798
    Source: 0x0000
    [Extended Source: Dresden-_ff:ff:01:02:56 (00:21:2e:ff:ff:01:02:56)]
    [Origin: 60]
    Frame Check Sequence (TI CC24xx format): FCS OK
ZigBee Network Layer Data, Dst: 0x2798, Src: 0x0000
    Frame Control Field: 0x0248, Frame Type: Data, Discover Route: Enable, Security Data
    Destination: 0x2798
    Source: 0x0000
    Radius: 10
    Sequence Number: 99
    [Extended Source: Dresden-_ff:ff:01:02:56 (00:21:2e:ff:ff:01:02:56)]
    [Origin: 60]
    ZigBee Security Header
ZigBee Application Support Layer Data, Dst Endpt: 1, Src Endpt: 1
    Frame Control Field: Data (0x40)
    Destination Endpoint: 1
    Cluster: On/Off (0x0006)
    Profile: Home Automation (0x0104)
    Source Endpoint: 1
    Counter: 22
ZigBee Cluster Library Frame
    Frame Control Field: Cluster-specific (0x01)
    Sequence Number: 67
    Command: Toggle (0x02)

As far as I can see the only difference (apart, obviously from the sequence numbers and addresses) is in the NWK header. I didn't specify this - the XBee API handles this. I did specify the 64-bit mac address.

From the XBee:

Frame Control Field: 0x1a48, Frame Type: Data, Discover Route: Enable, Security, Destination, Extended Source Data
    .... .... .... ..00 = Frame Type: Data (0x0)
    .... .... ..00 10.. = Protocol Version: 2
    .... .... 01.. .... = Discover Route: Enable (0x1)
    .... ...0 .... .... = Multicast: False
    .... ..1. .... .... = Security: True
    .... .0.. .... .... = Source Route: False
    .... 1... .... .... = Destination: True
    ...1 .... .... .... = Extended Source: True
    ..0. .... .... .... = End Device Initiator: False

From the RaspBee:

Frame Control Field: 0x0248, Frame Type: Data, Discover Route: Enable, Security Data
    .... .... .... ..00 = Frame Type: Data (0x0)
    .... .... ..00 10.. = Protocol Version: 2
    .... .... 01.. .... = Discover Route: Enable (0x1)
    .... ...0 .... .... = Multicast: False
    .... ..1. .... .... = Security: True
    .... .0.. .... .... = Source Route: False
    .... 0... .... .... = Destination: False
    ...0 .... .... .... = Extended Source: False
    ..0. .... .... .... = End Device Initiator: False
ebaauw commented 5 years ago

Hm, looks like the RaspBee uses a Radius of 10, where the XBee (and the bulb for the response) uses 30.

manup commented 5 years ago

The radius is fine since here the packet is only a single hop. My three innr lights are still alive and well, tomorrow I'll bring them in a larger network also with ubisys devices, maybe I can spot a difference there.

The extended source flag shouldn't be a problem but is quite interesting catch, I need to read more about it in the spec.

More to check:

ebaauw commented 5 years ago

Is the package from RaspBee ignored totally or does the innr bulb send at least the aps ack?

Totally ignored - the bulb doesn't send any reply at all, and it doesn't execute the command. I double-checked the security header in the request, but cannot find anything weird there either. I reverted to v2.05.38 and 0x26230500, and I don't see any difference in the requests sent from those versions.

I'm doubting if this has anything to do with the innr bulbs in particular, maybe it's just a generic breakdown of the network under v2.05.39/0x26240500. As I said earlier, I'm seeing red nodes, and during my latest tests, lost some Hue motion sensors, and some lights would no longer react to group commands, until I power cycled them.

manup commented 5 years ago

Since the requests look identical the problem might be more subtle. When a node doesn't answer not even with a aps ack it means the request is dropped on lower levels: mac (when destination address or panid don't match) or nwk which is more likely here.

One particular reason why this might happen is the nwk frame counter. This 32-bit counter is used to prevent that an attacker can replay a old command, so when a packet with lower counter as know is received the packet will be dropped. However the ZLL spec states that this counter should be ignored (ZLL profile). Which nearly all devices do.

I'm not 100% sure that 0x26230500 and 0x26240500 behave exactly the same, the nwk frame counter is stored in nvram. I'll schedule some tests.

image

Another reason wich may cause such an error are different nwk update id values.

ignidis commented 5 years ago

Just wanted to add a similar experience with the firmware 26240500, hope this helps. I was using the plugin version 2.05.35 (fw 26210500) on raspbian, hardware pi 3 B+, bluethooth and wifi disabled. My zigbee network contains GU Bulbs from OSRAM, RGBW Bulbs and sensors from Philips, IKEA sensors panels and Temperature Color Bulbs, mains switches from Busch-Jaeger ... A well tempered mix distributed across three floors, the Gateway placed in the middle floor ... This setup was working pretty smoothly, some issues with busch-jaeger and OSRAM but nothing too bad. Last week i changed to 2.05.39 and 262405 and the network went mad. Bulbs started getting unresponsive to phoscon, but responsive to the switch or the "all off" or the "old" web UI, other bulbs didn't react until after 10 or more seconds, some time i had to send the command three or more times, etc. I also had two situations where the GW lost the configuration and jumped to channel 0 network 0x1234. Looking at the network with a ConBee the mesh was green and quite a few nodes were directly attached to the GW. Yesterday i reverted to FW 26210500 and plugin 2.05.35 and everything is working smoothly again. I tried 2.05.35 with FW 26240500 but to no avail, i could not get a single light recognized, sensors and switches were recognized but not working (sensors didn't show updates in phoscon).

That said, your raspbee product is great :)

manup commented 5 years ago

I'm still aren't able to reproduce the problems, my network runs for some days now with all sorts of mixed devices without issues.

There is a new firmware which has the end-device improvements of 0x26240500 but otherwise uses the route optimization settings and the nwk security counter restore from 0x26230500.

I hope it shows improvements, but really hard to tell since I can't reproduce the issues here.

https://www.dresden-elektronik.de/rpi/deconz-firmware/deCONZ_Rpi_0x26250500.bin.GCF

ebaauw commented 5 years ago

Now running 2.05.39 with 0x26250500. Looking good so far. No issues during startup, the innr bulbs react to unicast commands. Also, my lumi.ctrl_neutral2 in-wall switches are responsive after restarting deCONZ (see #798).

Fingers crossed.

ebaauw commented 5 years ago

Also, my lumi.ctrl_neutral2 in-wall switches are responsive after restarting deCONZ.

Unfortunately that's not quite the case. I can control the switches from the deCONZ GUI, without first reading the OnOff cluster attributes, but the resource are still "not available" from the REST API, until I've done that.

manup commented 5 years ago

Now running 2.05.39 with 0x26250500. Looking good so far. No issues during startup, the innr bulbs react to unicast commands. Also, my lumi.ctrl_neutral2 in-wall switches are responsive after restarting deCONZ (see #798).

Fingers crossed.

Cool thanks for testing this, meanwhile do the innr bulbs still operate?

If the version works now maybe @holli73 can give it a shot #829

Unfortunately that's not quite the case. I can control the switches from the deCONZ GUI, without first reading the OnOff cluster attributes, but the resource are still "not available" from the REST API, until I've done that.

I think this is a different issue which should be fixed in deCONZ, the lumi.ctrl_neutral2 report wrong mac capabilities (RFD, FFD, receiverOnWhenIdle) when I recall correctly?

ebaauw commented 5 years ago

meanwhile do the innr bulbs still operate?

One does still work, the other one doesn't. The one that doesn't shows red in the GUI. Pressing 0 turns it back to yellow. Pressing L while it's red doesn't do anything; pressing L while it's yellow (shortly after pressing 0) causes the node to blink blue, but still silence on unicast messages. Power cycling the bulb remedies the situation.

When they work, they sometimes seem "lazy". Sometimes, they don't react, sometimes they react a second late or so. Sometimes they react immediately, but the GUI keeps showing Executing..., as if the ACK is delayed.

I think this is a different issue which should be fixed in deCONZ, the lumi.ctrl_neutral2 report wrong mac capabilities (RFD, FFD, receiverOnWhenIdle) when I recall correctly?

Yes, they're mains powered RFD devices, but report receiverOnWhenIdle as false. The REST API seems to be waiting for a message from the OnOff cluster before it makes the /lights resource available. Reading the Basic cluster attributes doesn't activate the resource, nor does the switch's polling of the time server.

ebaauw commented 5 years ago

The REST API seems to be waiting for a message from the OnOff cluster before it makes the /lights resource available.

Or maybe just from the corresponding endpoint. The Basic server and Time client clusters are on endpoint 0x01; the OnOff clusters on endpoints 0x02 and 0x03. I'll try reading the Groups clusters (also on 0x02 and 0x03) after the next restart.

Also the manufacturername and modelid still don't get copied to the second /lights resource.

manup commented 5 years ago

One does still work, the other one doesn't. The one that doesn't shows red in the GUI. Pressing 0 turns it back to yellow. Pressing L while it's red doesn't do anything; pressing L while it's yellow (shortly after pressing 0) causes the node to blink blue, but still silence on unicast messages.

Power cycling the bulb remedies the situation.

Hmm silly question: is the node descriptor of the bulb queried and visible in Node Info Panel? I assume yes, but if not it would look very similar.

When they work, they sometimes seem "lazy". Sometimes, they don't react, sometimes they react a second late or so. Sometimes they react immediately, but the GUI keeps showing Executing..., as if the ACK is delayed.

:/ very odd right now I don't really have an idea what's going on, the former logs didn't indicate wrong or missing commands from the gateway or problematic differences from former firmware.

Here is the next version, which just has adjusted table sizes so the stack ram usage is 2K less (mcu has 32K total), that reflects ram usage of older firmware like 0x261f0500, but I don't think that's the problem here.

https://www.dresden-elektronik.de/rpi/deconz-firmware/deCONZ_Rpi_0x26260500.bin.GCF

manup commented 5 years ago

Or maybe just from the corresponding endpoint. The Basic server and Time client clusters are on endpoint 0x01; the OnOff clusters on endpoints 0x02 and 0x03. I'll try reading the Groups clusters (also on 0x02 and 0x03) after the next restart.

Also the manufacturername and modelid still don't get copied to the second /lights resource.

There are now multiple issues related to lumi.ctrl_neutral2 I think it's best to continue the device discussion in issue https://github.com/dresden-elektronik/deconz-rest-plugin/issues/798

ebaauw commented 5 years ago

Hmm silly question: is the node descriptor of the bulb queried and visible in Node Info Panel?

As far as I can tell: yes.

Now running 0x26260500. Both lights work right after starting deCONZ...

MattL0 commented 5 years ago

What is new in 0x26260500 vs 0x26240500?

ebaauw commented 5 years ago

Now running 0x26260500. Both lights work right after starting deCONZ...

And this morning, one of them is no longer reacting to unicast messages. Reverting back to 0x26230500 for now.

ignidis commented 5 years ago

@manup thank you for your quick reaction and responsiveness, appreciate I updated the firmware and the software to the newest versions and initially everything was working smoothly. This was shortly after you published the 26th version. Next day i noticed some IKEA Temperature Color Bulbs not able to recognize or set the scene values; both the web app and Phoscon showing then not willing to change the temperature. The web app was showing them as COLOR Bulbs and not as TEMPERATURE. After many attempts to reset the lights, i deleted them from the DB and rejoined... No effect. I tried to reset the lamps via touchlink but to no avail, even worse, the lights became unresponsive ... As a note the raspbee was showing 4 IKEA devices in a 50 cm area, NONE was in the neighborhood (minimum 2 meters), the nearest device -busch-jaeger switch- was not displayed (1m away). The next step was to reset the unresponsive lights via IKEA Gateway, that worked well. Next i powered off the IKEA Gateway.

What happened next is an extrange thing, the raspbee lost all connections to the network. I rebootted, deleted the config and restored a backup, no way to get the raspbee communicating with the network. The red and green LEDs where almost permanently on. As it happened before the raspbee changed to an unknown network, a backup fixed that, but no connectivity.

In the end only going back to the 21st firmware, uninstalling and reinstalling the deconz software and restoring a backup restored connectivity.

It is working now, i exchanged most IKEA Bulbs with PHILIPS and the only estrange thing is the OSRAM GU's going to on on a random pattern (?) ... Guess there is a lot of crap in the DB.

My conclusion so far , the new IKEA Bulbs seem to be troublesome, the panels and old bulbs seem more stable, sensors are estrange, some use the batteries in two days other work like a charm. PHILIPS is expensive but makes no trouble ...

manup commented 5 years ago

Thanks for the detailed report, there is definitely some strange things going on. Currently it's not clear what the real issue is nor if it is a firmware issue on lights or gateway side.

Meanwhile I have changed the Philips lights in my home to just innr and Ikea, hopefully this will trigger some faults to investigate deeper what's going on.

ebaauw commented 5 years ago

Closing this one. Discussion sort of moved to #849.

manup commented 5 years ago

Bingo, finally one of my three innr lights shows the same behavior! deCONZ firmware is very stubborn and doesn't try to find a new route since it sees the innr light.

Link status with incoming cost 4 and outgoing cost 7 I wonder why the outgoing cost isn't considered bad enough to cause a route establishment. Need to check the related code ..