Koenkk / zigbee-herdsman

A Node.js Zigbee library
MIT License
456 stars 277 forks source link

Packet decoding bug in deconz driver? #576

Closed lbschenkel closed 1 year ago

lbschenkel commented 1 year ago

Related to https://github.com/Koenkk/zigbee2mqtt/issues/12611#issuecomment-1240692790.

None of my IKEA remotes work in Z2M (they can pair just fine, but button presses are not recognized). I'm using Z2M with a ConBee (original, not model II). The exact same hardware combination works with deCONZ.

I dug a bit deeper and found out these in the logs:

2022-09-08T11:35:30.125Z zigbee-herdsman:deconz:frameParser DATA_INDICATION RESPONSE - seqNr. 59 srcAddr: 0xeaf8 destAddr: 0x0 profile id: 0x104 cluster id: 0x5 lqi: 255
2022-09-08T11:35:30.133Z zigbee-herdsman:controller:log Received 'zcl' data '{"frame":{"Header":{"frameControl":{"frameType":1,"manufacturerSpecific":true,"direction":0,"disableDefaultResponse":false,"reservedBits":0},"transactionSequenceNumber":31,"manufacturerCode":4476,"commandIdentifier":7},"Payload":{"value":257,"value2":13},"Command":{"ID":7,"parameters":[{"name":"value","type":33},{"name":"value2","type":33}],"name":"tradfriArrowSingle"}},"address":60152,"endpoint":1,"linkquality":255,"groupID":0,"wasBroadcast":true,"destinationEndpoint":53}'
2022-09-08T11:35:30.133Z zigbee-herdsman:controller:log 'zcl' data is from unknown device with address '60152', skipping...

So Z2M is seeing the zcl data but cannot find the source device, and skips it. But note that the source address is supposed to be the NWK, and what is being logged is definitely not the NWK. I re-paired the remote forcing a change of NWK but the address being logged is always the same. The address is 60152, or 0xEAF8. which will be significant later.

I then used Zshark to sniff into the network and confirmed that the address being logged is not the address in the packets: image

Note that the source address is always fixed as 0xEAF8. It matches specific bytes of the IEEE address. I am assuming that there's a bug in the decoding logic here: https://github.com/Koenkk/zigbee-herdsman/blob/b4cdf8ba6935ca20aa6967e1a0afed32f320e64b/src/adapter/deconz/driver/frameParser.ts#L241 and it is extracting the wrong bytes from the packet.

Koenkk commented 1 year ago

@ChrisHae can you look into this?

lbschenkel commented 1 year ago

Follow-up: I purchased a SONOFF ZBDongle-P and the problem went away immediately, which is further confirmation that the problem is with the driver. It is not the ConBee dongle itself, as it works perfectly well with deCONZ.

lbschenkel commented 1 year ago

OK, I have a guess.

Looking at the documentation for the ConBee serial protocol at https://deconz.dresden-elektronik.de/raspbian/deCONZ-Serial-Protocol-en_1.21.pdf, I came out with the following table of frame addressing scenarios:

scenario 0-7 8 9-10 11 12 13-14 15-16 17 18 19-20 21 22-26 27-28 29-
1->dst16+src16 ... dstMode=1,2 dst16 [u8] srcMode=2 src16 ... ... ... ... ... ... ... ...
2->dst16+src64 ... dstMode=1,2 dst16 [u8] srcMode=3 src64[0-1] src64[2-3] src64[4] src64[5] src64[6-7] ... ... ... ...
3->dst16+src16+src64 ... dstMode=1,2 dst16 [u8] srcMode=4 src16 src64[0-2] src64[3] src64[4] src64[5-6] src64[7] ... ... ...
4->dst64+src16 ... dstMode=3 dst64[0..1] dst64[2] dst64[3] dst64[4-5] dst64[6-7] [u8] srcMode=2 src16 ... ... ... ...
5->dst64+src64 ... dstMode=3 dst64[0..1] dst64[2] dst64[3] dst64[4-5] dst64[6-7] [u8] srcMode=3 src64[0-1] src64[2] src64[3-7] ... ...
6->dst64+src16+src64 ... dstMode=3 dst64[0..1] dst64[2] dst64[3] dst64[4-5] dst64[6-7] [u8] srcMode=3 src16 src64[0] src64[1-5] src64[6-7] ...

The driver believes it is scenario 4 or 6 (dstMode=3,srcMode=2 or 3) and therefore it is trying to parse the NWK at offsets 19-20. However, the actual packet is from scenario 3 (dstMode=1 or 2, srcMode=4) and at that location you'll find the bytes 5-6 of the source IEEE address (0xEAF8) when the actual NWK is at offset 13 (0x9FA9).

This matches exactly what I am seeing.

lbschenkel commented 1 year ago

The plot thickens: I have put the ConBee on my PC and I joined the same device to it (naturally NWK will change), added a breakpoint and pressed the middle button of the remote. This is the data passed to parseReadReceivedDataResponse:

{
  "0": 23,
  "1": 213,
  "2": 0,
  "3": 33,
  "4": 0,
  "5": 26,
  "6": 0,
  "7": 34,
  "8": 1, // destAddrMode
  "9": 133,
  "10": 3,
  "11": 53,
  "12": 2, // srcAddrMode
  "13": 248, // srcAddr16 = 0xEAF8 (should have been 0xC909)
  "14": 234, // srcAddr16
  "15": 1,
  "16": 4,
  "17": 1,
  "18": 6,
  "19": 0,
  "20": 3,
  "21": 0,
  "22": 1,
  "23": 17,
  "24": 2,
  "25": 0,
  "26": 175,
  "27": 255,
  "28": 166,
  "29": 0,
  "30": 1,
  "31": 4,
  "32": 228,
  "33": 181,
  "34": 248
}

The code in this method is behaving correctly and according to spec. The two bytes which are supposed to be the NWK (0xC909) have bytes 5-6 of the IEEE address instead. The rest of the IEEE address is nowhere to be found. Neither the bytes for the NWK.

I will need to dig more to see if this got corrupted somehow further up in the call stack or this data was received as-is from the serial port. But if this is what comes out from the serial port would not make any sense, as deCONZ works with the same hardware. Also when doing packet captures, I can see the correct NWK included in the packet (you can also see in the screenshot up above), so the ConBee must be passing the correct data.

Unless the serial mode used by Z2M the same is a different mode than the mode used by deCONZ?

lbschenkel commented 1 year ago

As an experiment, this ugly hack made Z2M work successfully with ConBee:

diff --git a/src/controller/model/device.ts b/src/controller/model/device.ts
--- a/src/controller/model/device.ts    (revision b4cdf8ba6935ca20aa6967e1a0afed32f320e64b)
+++ b/src/controller/model/device.ts    (date 1662819031967)
@@ -376,7 +376,10 @@
     }

     public static byNetworkAddress(networkAddress: number): Device {
-        return Device.all().find(d => d.networkAddress === networkAddress);
+        return Device.all().find(d =>
+            d.networkAddress === networkAddress
+            || d.ieeeAddr.substring(12, 16) === networkAddress.toString(16) // FIXME: ugly hack for deCONZ+IKEA
+        );
     }

     public static byType(type: DeviceType): Device[] {

Then packets stopped being discarded and IKEA remotes started to work. (IKEA motion sensors as well, they were also broken.)

Not saying this is appropriate, of course -- but further confirmation that a part of the IEEE address is showing up in the place of NWK, somehow.

lbschenkel commented 1 year ago

Hi @Koenkk and/or @ChrisHae, did anyone had a chance to replicate my findings? Is there something that is not clear? Is there anything I can do to help?

ahd71 commented 1 year ago

Great analysis! Let me know if you need a tester as I also have this issue - strangely enough only with most of my ikea controllers - but not all of them. Despite both working and non working are on the same firmware release and in the same zigbee network which I find hard to explain.

Koenkk commented 1 year ago

@lbschenkel I'm wondering if this is a bug in the deconz z2m driver of the firmware itself. Just to make sure; can you check that you are using the latest firmware and if the issue also occurs with Deconz?

lbschenkel commented 1 year ago

@lbschenkel I'm wondering if this is a bug in the deconz z2m driver of the firmware itself. Just to make sure; can you check that you are using the latest firmware and if the issue also occurs with Deconz?

Yes, latest firmware (0x26400500). The issue does not occur with deCONZ -- I have these devices for years and they stopped working (remotes, motion sensors) when I migrated to Z2M. This does not occur as well with the sniffer firmware.

ahd71 commented 1 year ago

And in z2m it did work flawless until a few months ago.

Koenkk commented 1 year ago

And in z2m id did work flawless until a few months ago.

Interesting, what was the last known working z2m version? If we know this maybe we can find out which change caused this.

ahd71 commented 1 year ago

I don't know exactly but given that below issue was raised in may it can be around around that time for me too even if I think it was slightly later I noticed it, but I could also have upgraded later. https://github.com/Koenkk/zigbee2mqtt/issues/12611

What for me is really strange is that I have at least six such controllers, all with same firmware release. Two if them works, rest doesn't. All used to work in z2m. I have a zzh stick.

ahd71 commented 1 year ago

Just checked another thing. Two of my six devices have another software build date than the others, but all has the same release number. But from what I see it doesn't correlate to the devices that works vs doesn't. Strange.

Koenkk commented 1 year ago

@ahd71 if you don't have a Conbee stick your issue is not related to this.

ahd71 commented 1 year ago

Ok, I'm not using deconz or conbee so makes sense. But the symtoms is the same as described in #12611 so either there are two different problems that started somewhat at roughly the same time or the problem isnt in this driver. I wished I could help out somehow.

neonal commented 1 year ago

@Koenkk is there any chance this is prioritized? None of my IKEA remotes work with zigbee2mqtt.

Koenkk commented 1 year ago

@ChrisHae is this something you could check?

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days

lbschenkel commented 1 year ago

Please don't close this. Any update, @Koenkk or @ChrisHae ?

celindho commented 1 year ago

I'm also experiencing this same issue with most of my Ikea remotes (RaspBee, not the II model), where I get the "'zcl' data is from unknown device with address" error message in my logs. Any news on how to solve this issue?

I don't have a sensible setup for debugging the error myself. Any pointers on how to get started?

lbschenkel commented 1 year ago

Honestly: buy a Sonoff dongle. It is cheap and Z2M works much better (flawlessly so far for me) with it. This is what I ended up doing.

It is clear that the ConBee dongle implementation is not a priority and has unaddressed issues. This is not meant as an attack: most projects like this are volunteer based, have limited resources, and have to have priorities.

reneklootwijk commented 1 year ago

This does not happen with the Conbee II stick, I thought I just mention because I do not know how much of the code is shared for Conbee and Conbee II.

ahd71 commented 1 year ago

I have the same symtom and are using a zzh stick (not conbee). Strangely enough about half of the ikea remote works of the same model and firmware. Reporting battery and link quality but not keypresses.

matt-mad commented 1 year ago

Honestly: buy a Sonoff dongle. It is cheap and Z2M works much better (flawlessly so far for me) with it. This is what I ended up doing.

It is clear that the ConBee dongle implementation is not a priority and has unaddressed issues. This is not meant as an attack: most projects like this are volunteer based, have limited resources, and have to have priorities.

Thank you for this suggestion. I am experiencing this bug with a raspbee (I) hat as well and was waiting for a fix, but now I will consider buying another stick or migrating back to deconz.

matt-mad commented 1 year ago

Maybe the flaws could be documented on the "supported adapters" page for raspbee and conbee, as long as there is no fix?

lbschenkel commented 1 year ago

Honestly: buy a Sonoff dongle. It is cheap and Z2M works much better (flawlessly so far for me) with it. This is what I ended up doing.

Thank you for this suggestion. I am experiencing this bug with a raspbee (I) hat as well and was waiting for a fix, but now I will consider buying another stick or migrating back to deconz.

I strongly suggest buying another stick, if you can spare the money. I used ConBee+deCONZ for many years, and it was fine (not particularly unhappy or anything), but I wanted to give Z2M a shot and today I'm much happier with it and wouldn't go back. Three main reasons: (1) it is usually faster in supporting new devices, (2) pairing seems to be quicker, (3) firmware upgrades work consistently (it took many tries with deCONZ).

I say this as a person that even wrote and contributed code to deCONZ multiple times.

ahd71 commented 1 year ago

I agree, z2m scores 9.97/10 for me. Wouldn't go back to deconz. I'm sure we can figure out the cause and fix it together though. Where do we start?

lbschenkel commented 1 year ago

I did quite some through investigation in my opinion, it's all above. As a developer myself I think what I have included is enough to reproduce it and to get to the bottom of it. So no need to do it all over again IMHO.

It's just a matter of someone picking up and doing the work, which didn't happen yet. (Again, not pointing fingers: this is a volunteer project.) Unfortunately I'm not able to allocate the effort to go further myself, given current personal constraints (even though I personally contributed similar work for deCONZ in the past). That's why I decided to simply buy a new dongle and solve the issue in that way.

ahd71 commented 1 year ago

Yes I've seen your detailed analysis. Very valuable in the hands of someone that knows the code and can pick it up. Thanks! I'll hope time will eventually come for that. Wish I had the competence myself.

celindho commented 1 year ago

Thanks for your good input! I'll consider going with the Sonoff dongle instead.

konstantint commented 1 year ago

Thanks for your good input! I'll consider going with the Sonoff dongle instead.

Just FYI, but I have a Sonoff ZBDongle and two different IKEA buttons stopped working for me in z2m recently just as mysteriously.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 7 days