fairecasoimeme / ZiGate

Zigate is an Universal Zigbee Gateway
http://zigate.fr
171 stars 59 forks source link

FW 3.1e -> 3.21OPDM: unexpected 804E answer from zigate #370

Open tcharp38 opened 3 years ago

tcharp38 commented 3 years ago

During LQI collection, starting by zigate interrogation, I got unexpected answer.

Request to zigate: {"topic":"CmdAbeille1\/0000\/Management_LQI_request","payload":"address=0000&StartIndex=00"} First answer tells there are only 3 entries in the table

[2021-06-19 00:00:06] Abeille1, Type=804E/Management LQI response, SQN=CB, Status=00, NTableEntries=03, NTableListCount=02, StartIndex=00, SrcAddr=0000 [2021-06-19 00:00:06] NAddr=5AC7, NExtPANId=8FDBA9CA044F2AFA, NExtAddr=00158D00047D3A86, NDepth=01, NLQI=B5, NBitMap=1A => RxOFFWhenIdle/Child/?/EndDevice [2021-06-19 00:00:06] NAddr=1AEA, NExtPANId=8FDBA9CA044F2AFA, NExtAddr=5C0272FFFEA0E396, NDepth=01, NLQI=AA, NBitMap=1A => RxOFFWhenIdle/Child/?/EndDevice

then a request for startindex 02: {"topic":"CmdAbeille1\/0000\/Management_LQI_request","payload":"address=0000&StartIndex=02"}

but the next answer is now telling there are 0B entries in the table but does not return anything.

[2021-06-19 00:00:06] Abeille1, Type=804E/Management LQI response, SQN=10, Status=00, NTableEntries=0B, NTableListCount=00, StartIndex=03, SrcAddr=0000

then again zigate tells there are only 3 entries and returns proper startindex 02.

[2021-06-19 00:00:06] Abeille1, Type=804E/Management LQI response, SQN=CC, Status=00, NTableEntries=03, NTableListCount=01, StartIndex=02, SrcAddr=0000 [2021-06-19 00:00:06] NAddr=E93B, NExtPANId=8FDBA9CA044F2AFA, NExtAddr=00158D000465AF13, NDepth=01, NLQI=A8, NBitMap=1A => RxOFFWhenIdle/Child/?/EndDevice

Any idea ? I've not seen this before 3.1e.

fairecasoimeme commented 3 years ago

Hi there is a difference between NTablesEntries et NTableListCount. this is not the same thing.

NTablesEntries is the total number of Neighbour table entries on the remote node NTableListCount is the number of Neighbour table entries reported in this response (through pNetworkTableList)

With the firmware code this is the NTableListCount which is used :

here is the code :

case ZPS_ZDP_MGMT_LQI_RSP_CLUSTER_ID:
                    {
                        uint8    u8Values;
                        uint8    u8Bytes;
                        ZNC_BUF_U8_UPD   ( &au8LinkTxBuffer [u16Length] , sApsZdpEvent.uZdpData.sMgmtLqiRsp.u8Status,                    u16Length );
                        ZNC_BUF_U8_UPD   ( &au8LinkTxBuffer [u16Length] , sApsZdpEvent.uZdpData.sMgmtLqiRsp.u8NeighborTableEntries,      u16Length );
                        ZNC_BUF_U8_UPD   ( &au8LinkTxBuffer [u16Length] , sApsZdpEvent.uZdpData.sMgmtLqiRsp.u8NeighborTableListCount,    u16Length );
                        ZNC_BUF_U8_UPD   ( &au8LinkTxBuffer [u16Length] , sApsZdpEvent.uZdpData.sMgmtLqiRsp.u8StartIndex,                u16Length );
                        if( sApsZdpEvent.uZdpData.sMgmtLqiRsp.u8Status == ZPS_E_SUCCESS )
                        {
                            for ( u8Values = 0; u8Values < sApsZdpEvent.uZdpData.sMgmtLqiRsp.u8NeighborTableListCount; u8Values++ )
                            {
                                ZNC_BUF_U16_UPD  ( &au8LinkTxBuffer [u16Length] , sApsZdpEvent.uLists.asNtList[u8Values].u16NwkAddr,            u16Length );
                                ZNC_BUF_U64_UPD  ( &au8LinkTxBuffer [u16Length] , sApsZdpEvent.uLists.asNtList[u8Values].u64ExtPanId,           u16Length );
                                ZNC_BUF_U64_UPD  ( &au8LinkTxBuffer [u16Length] , sApsZdpEvent.uLists.asNtList[u8Values].u64ExtendedAddress,    u16Length );
                                ZNC_BUF_U8_UPD   ( &au8LinkTxBuffer [u16Length] , sApsZdpEvent.uLists.asNtList[u8Values].u8Depth,               u16Length );
                                ZNC_BUF_U8_UPD   ( &au8LinkTxBuffer [u16Length] , sApsZdpEvent.uLists.asNtList[u8Values].u8LinkQuality,         u16Length );

                                u8Bytes  =  sApsZdpEvent.uLists.asNtList[u8Values].uAncAttrs.u2DeviceType;
                                u8Bytes |=  ( sApsZdpEvent.uLists.asNtList[u8Values].uAncAttrs.u2PermitJoining << 2 );
                                u8Bytes |=  ( sApsZdpEvent.uLists.asNtList[u8Values].uAncAttrs.u2Relationship << 4 );
                                u8Bytes |=  ( sApsZdpEvent.uLists.asNtList[u8Values].uAncAttrs.u2RxOnWhenIdle << 6 );
                                ZNC_BUF_U8_UPD   ( &au8LinkTxBuffer [u16Length] , u8Bytes,    u16Length );
                            }
                        }
                        ZNC_BUF_U16_UPD  ( &au8LinkTxBuffer [u16Length],  psStackEvent->uEvent.sApsDataIndEvent.uSrcAddress.u16Addr,    u16Length );
                        vSL_WriteMessage ( E_SL_MSG_MANAGEMENT_LQI_RESPONSE,
                                           u16Length,
                                           au8LinkTxBuffer,
                                           u8LinkQuality );
                    }
                    break;
tcharp38 commented 3 years ago

Hi We are in line.

NTableEntries is supposed to be the total of entries in the table. So no reason to get 03 then 0B then 03. This is the point.

I will try to do same test with 31d, just in case.

tcharp38 commented 3 years ago

I can see this issue from other users as well: zigate is sometimes responding badly:

Request to addr 0000, startIndex=18 [2021-08-24 11:36:43] Abeille1, Type=804E/Management LQI response, SQN=00, Status=00, NTableEntries=23, NTableListCount=02, StartIndex=18, SrcAddr=0000 [2021-08-24 11:36:43] NAddr=DB96, NExtPANId=A77D1F6C993597B4, NExtAddr=00158D0003F46CBE, NDepth=00, NLQI=7E, NBitMap=69 => RxONWhenIdle/Sibling/?/Router [2021-08-24 11:36:43] NAddr=9052, NExtPANId=A77D1F6C993597B4, NExtAddr=00158D00044B1C3A, NDepth=00, NLQI=48, NBitMap=69 => RxONWhenIdle/Sibling/?/Router Request to addr 0000, startIndex=1A [2021-08-24 11:36:43] Abeille1, Type=804E/Management LQI response, SQN=18, Status=00, NTableEntries=0B, NTableListCount=00, StartIndex=23, SrcAddr=0000 [2021-08-24 11:36:44] Abeille1, Type=8000/Status, Status=00/Success, SQN=01, PacketType=004E [2021-08-24 11:36:44] Abeille1, Type=804E/Management LQI response, SQN=01, Status=00, NTableEntries=23, NTableListCount=02, StartIndex=1A, SrcAddr=0000 [2021-08-24 11:36:44] NAddr=17F8, NExtPANId=A77D1F6C993597B4, NExtAddr=00158D00041477B6, NDepth=00, NLQI=80, NBitMap=69 => RxONWhenIdle/Sibling/?/Router [2021-08-24 11:36:44] NAddr=9212, NExtPANId=A77D1F6C993597B4, NExtAddr=00158D000403FF07, NDepth=00, NLQI=30, NBitMap=69 => RxONWhenIdle/Sibling/?/Router Request to addr 0000, startIndex=1C

Worse than this, this looks like following LQI management answers from Zigate are just... crap.

[2021-08-24 11:36:44] Abeille1, Type=804E/Management LQI response, SQN=1C, Status=00, NTableEntries=23, NTableListCount=0B, StartIndex=20, SrcAddr=0000 [2021-08-24 11:36:44] NAddr=0002, NExtPANId=A77D1F6C993597B4, NExtAddr=00158D00040017E0, NDepth=6C, NLQI=1F, NBitMap=75 => RxONWhenIdle/?/PermitJoinOFF/Router [2021-08-24 11:36:44] Eq addr 0002 is unknown. [2021-08-24 11:36:44] NAddr=A77D, NExtPANId=040016D4993597B4, NExtAddr=00158D00023ABF00, NDepth=5C, NLQI=04, NBitMap=0B => RxOFFWhenIdle/Parent/?/? [2021-08-24 11:36:44] Eq addr A77D is unknown. [2021-08-24 11:36:44] NAddr=8D00, NExtPANId=0400015004007F60, NExtAddr=A77D1F6C0000D400, NDepth=4D, NLQI=79, NBitMap=51 => RxONWhenIdle/Child/PermitJoinON/Router [2021-08-24 11:36:44] Eq addr 8D00 is unknown. [2021-08-24 11:36:44] NAddr=1C00, NExtPANId=0AB0980000000000, NExtAddr=80311C000400036C, NDepth=0B, NLQI=23, NBitMap=20 => RxOFFWhenIdle/Sibling/PermitJoinON/Coordinator [2021-08-24 11:36:44] Eq addr 1C00 is unknown. [2021-08-24 11:36:44] NAddr=1F75, NExtPANId=7D1F6C993597B400, NExtAddr=158D00040017E06C, NDepth=7D, NLQI=A7, NBitMap=40 => RxONWhenIdle/Parent/PermitJoinON/Coordinator [2021-08-24 11:36:44] Eq addr 1F75 is unknown. [2021-08-24 11:36:44] NAddr=0004, NExtPANId=993597B400158D00, NExtAddr=023ABF005C040B8D, NDepth=01, NLQI=00, NBitMap=50 => RxONWhenIdle/Child/PermitJoinON/Coordinator [2021-08-24 11:36:44] Eq addr 0004 is unknown. [2021-08-24 11:36:44] NAddr=9800, NExtPANId=60A77D1F6C0000D4, NExtAddr=004D79511C000AB0, NDepth=00, NLQI=00, NBitMap=00 => RxOFFWhenIdle/Parent/PermitJoinON/Coordinator [2021-08-24 11:36:44] Eq addr 9800 is unknown. [2021-08-24 11:36:44] NAddr=97B4, NExtPANId=1C000400036C0B23, NExtAddr=201F757D1F6C9935, NDepth=15, NLQI=00, NBitMap=C1 => ?/Parent/PermitJoinON/Router [2021-08-24 11:36:44] Eq addr 97B4 is unknown. [2021-08-24 11:36:44] NAddr=023A, NExtPANId=17E06C7DA7400004, NExtAddr=993597B400158D00, NDepth=00, NLQI=BF, NBitMap=D0 => ?/Child/PermitJoinON/Coordinator [2021-08-24 11:36:44] Eq addr 023A is unknown. [2021-08-24 11:36:44] NAddr=511C, NExtPANId=010050980060A77D, NExtAddr=1F6C0000D4004D79, NDepth=0A, NLQI=00, NBitMap=30 => RxOFFWhenIdle/?/PermitJoinON/Coordinator [2021-08-24 11:36:44] Eq addr 511C is unknown. [2021-08-24 11:36:44] NAddr=9935, NExtPANId=97B41C000400036C, NExtAddr=0B23201F757D1F6C, NDepth=00, NLQI=15, NBitMap=49 => RxONWhenIdle/Parent/?/Router [2021-08-24 11:36:44] Eq addr 9935 is unknown.

Nobody in zigate team faced such behavior ?

pipiche38 commented 3 years ago

I'm not sure to understand what is your problem . on my end i never notice any issues on decoding the Neighbourg table from ZiGate.

tcharp38 commented 3 years ago

The problem is that

I don't see why it would be a pb of "decoding". But ok so far I can't prove it is zigate. I will try to reproduce something on my side and sniff the network in //.

pipiche38 commented 3 years ago

The only comment I can make, is I don’t think that I’m making an extensive use of the 0x004E command as you do on Abeille. On our Domoticz Plugin it is only use when the user is requesting a Topology Report

On 25 Aug 2021, at 10:36, Tcharp38 @.***> wrote:

The problem is that

zigate may return stupid message during that interrogation with 3.1e (never saw that before). Saying the table is 23 then 0B then 23 again is... no sense isn't it ? worse that this at some point, returned message is just bulshit as you can see all these unknown devices I don't see why it would be a pb of "decoding". But ok so far I can't prove it is zigate. I will try to reproduce something on my side and sniff the network in //.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/fairecasoimeme/ZiGate/issues/370#issuecomment-905300173, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB7IKWVQ4UPRUG5LEDHRDE3T6STRZANCNFSM47ABD7NA.

tcharp38 commented 3 years ago

Ok clear. Let me try to capture a real falling case.

pipiche38 commented 3 years ago

Now, how can you sniff as you mentioned the issues is when getting the Neighbourg table of the ZiGate ?

tcharp38 commented 3 years ago

You are perfectly right. I missed this point but I can now reproduce on my simple test bench. As a summary, 804E answer might be bulshit when interrogating zigate (not transmitted over the air)

See attached analysis file analysis.txt

tcharp38 commented 2 years ago

Just to refresh that I still see such unexepected behavior with 3.21-OPDM, PiZigate v1

I get the first answer from Zigate telling 6 entries total but 2 only listed here

[2022-01-19 13:02:14] Abeille1, Type=804E/Management LQI response, SQN=68, Status=00, NTableEntries=06, NTableListCount=02, StartIndex=00, SrcAddr=0000 [2022-01-19 13:02:14] NAddr=8052, NExtPANId=C99F5D26E97D77A6, NExtAddr=00124B002242C5C5, NDepth=01, NLQI=D8, NBitMap=1A => RxOFFWhenIdle/Child/?/EndDevice [2022-01-19 13:02:14] NAddr=EE9B, NExtPANId=C99F5D26E97D77A6, NExtAddr=00158D000478E64E, NDepth=01, NLQI=CE, NBitMap=1A => RxOFFWhenIdle/Child/?/EndDevice

then a non sense answer for without new request with "0B" as total and none reported

[2022-01-19 13:02:14] Abeille1, Type=804E/Management LQI response, SQN=10, Status=00, NTableEntries=0B, NTableListCount=00, StartIndex=06, SrcAddr=0000

then a good answer for next request (startIndex=02)

[2022-01-19 13:02:14] Abeille1, Type=804E/Management LQI response, SQN=69, Status=00, NTableEntries=06, NTableListCount=02, StartIndex=02, SrcAddr=0000 [2022-01-19 13:02:14] NAddr=8217, NExtPANId=C99F5D26E97D77A6, NExtAddr=00158D000464C9C2, NDepth=01, NLQI=F8, NBitMap=1A => RxOFFWhenIdle/Child/?/EndDevice [2022-01-19 13:02:14] NAddr=0098, NExtPANId=C99F5D26E97D77A6, NExtAddr=04CF8CDF3C77164B, NDepth=01, NLQI=F6, NBitMap=1A => RxOFFWhenIdle/Child/?/EndDevice