darkxst / silabs-firmware-builder

Silicon Labs firmware builder
https://darkxst.github.io/silabs-firmware-builder/
263 stars 25 forks source link

LQI and route problem with multipan firmware #12

Closed Drealine closed 11 months ago

Drealine commented 11 months ago

Hi @darkxst I hope you're doing well. I facing some stranges issues regarding Multi pan protocol. When using the multi pan FW and use with Z2M, I've got LQI to 255 for each device. Also, passive devices sometimes not sent correctly actions (for example press button didn't update state, I need to press 2, 3,4. times to work) and using strange route.

Using Zigbee firmware alone work great. I known that the problem can be inspect by Z2M directly but before that, I would like to known if you've facing same issues. Regards

darkxst commented 11 months ago

I am aware of the LQI issue with it reporting 255. I've also heard reports it affects the SkyConnect dongle, so is probably a bug somewhere in the Silicon labs code either the multipan firmware or Zigbeed (doesnt seem specific to ZHA or Z2M). It might be worth reporting this one on the Silabs forums (which i havent got around too yet).

That said the LQI issue is purely cosmetic (unless it is hiding underlying signal quality issues). I dont expect it related to your button issue. I have a Sonoff button here, that works perfectly with the multipan firmware. What button device are you using? is there anything in the logs indicating errors? Is it working ok with NCP (normal zigbee) firmware?

darkxst commented 11 months ago

Looking at source code, LQI reporting doesn't appear to be implemented in the multipan/zigbeed stack yet. Not sure if this is a technical limitation or just they haven't implemented it yet!

Drealine commented 11 months ago

I am aware of the LQI issue with it reporting 255. I've also heard reports it affects the SkyConnect dongle, so is probably a bug somewhere in the Silicon labs code either the multipan firmware or Zigbeed (doesnt seem specific to ZHA or Z2M). It might be worth reporting this one on the Silabs forums (which i havent got around too yet).

That said the LQI issue is purely cosmetic (unless it is hiding underlying signal quality issues). I dont expect it related to your button issue. I have a Sonoff button here, that works perfectly with the multipan firmware. What button device are you using? is there anything in the logs indicating errors? Is it working ok with NCP (normal zigbee) firmware?

Hi @darkxst Tk again for your answer. Ok so it's not a bug. I think it's not implemeted, if not, it would be strange.

I'm using Philipe Hue smart button, but the probleam appear also with my Aqara P1 sensors. Sometime, the state of occupency is not updated. By regarding the network on Z2M, I see that end-devices is not connect to the coordinator but with a routeur (like my Philipe hue or Aqara H1). With NCP firmware, no problem with state and LQI. With NCP, end-devices are connected directly to the coordinator (because the LQI seems the same if it's connected near to the first router and coordinator).

So I think end-devices state report problem seems to be relative to end-devices while is not connected directly to the coordinator. But at this time, no logs with timeout, or etc in Z2M or Multi PAN addon in HA.

darkxst commented 11 months ago

see that end-devices is not connect to the coordinator but with a routeur (like my Philipe hue or Aqara H1).

Is the LQI acceptable via router? generally going through routers wont cause issues, but then Aqara devices are not entirely compliant with zigbee specs so can be troublesome at times, particularly with incompatible routers.

So I think end-devices state report problem seems to be relative to end-devices

I am not sure that the end devices are seeing the 255 LQI values. IF they did everything would choose the coordinator to pair with. I did some testing with 1x router and multipan coordinator a while back (albeit in ZHA) and end devices seemed to pair correctly to wither the router or coordinator based on real (estimated) LQIs. One thing to keep in mind is the zigbee mesh is not overly dynamic, if you pair a device with a specific router, it will tend to prefer that router even when better link options are available. outside of directly pairing it can take a long time for a device to migrate to a new router.

Also what NCP Firmware version are you using? when running multipan stack, zigbeed is running 7.3 branch of Emberznet. This may well have bugs that dont exist in 6.10.3 or 7.1 firmwares if your using older NCP builds.

Drealine commented 11 months ago

Is the LQI acceptable via router? generally going through routers wont cause issues, but then Aqara devices are not entirely compliant with zigbee specs so can be troublesome at times, particularly with incompatible routers.

Yes agree with you.

I am not sure that the end devices are seeing the 255 LQI values. IF they did everything would choose the coordinator to pair with. I did some testing with 1x router and multipan coordinator a while back (albeit in ZHA) and end devices seemed to pair correctly to wither the router or coordinator based on real (estimated) LQIs. One thing to keep in mind is the zigbee mesh is not overly dynamic, if you pair a device with a specific router, it will tend to prefer that router even when better link options are available. outside of directly pairing it can take a long time for a device to migrate to a new router.

Also what NCP Firmware version are you using? when running multipan stack, zigbeed is running 7.3 branch of Emberznet. This may well have bugs that dont exist in 6.10.3 or 7.1 firmwares if your using older NCP builds.

Oh, I tought Zigbee use this value to correcty use the best route. NCP firmware used is : 7.3.0.0 build 131. And multipan using the same version like you say. Do you think is a route issue regarding end-devices ? I see that routers devices like light don't have a problem regarding reporting state. Very strange that the probleam not appear with NCP firmware directly.

darkxst commented 11 months ago

Oh, I tought Zigbee use this value to correcty use the best route.

Yes LQI is used to determine link cost and thus routing, but I dont think the end-devices are seeing this fictitious 255 value. If they did every device would pair directly to the coordinator and ignore any routers in your mesh.

How did you migrate devices between firmwares? did you repair everything?

Drealine commented 11 months ago

Yes, If I make a scan in Z2M, the LQI is correct in schema but not in list section. Yes, did all repair. I tested again with multipan firmware and routers also have problem. State action of wall switch is not updated. Sometimes it's ok, sometime I need to tap 5-6 times to work. So definitly, I think is a firmware problem. In Z2M with multipan enable, show EZSP v12 In Z2M with NCP firmware, show EZSP v13, it's normal ?

darkxst commented 11 months ago

EZSP protocol v12 is the current version, it was only recently released with 7.3.0

Drealine commented 11 months ago

EZSP protocol v12 is the current version, it was only recently released with 7.3.0

Ok I think I smoked ahah. I'll see tonight. But at this time to resume : With ZigBee firmware it's ok With multipan firmware, have issues of state devices. I look at log of addon multi pan and I've some errors regarding mDNS.

darkxst commented 11 months ago

I look at log of addon multi pan and I've some errors regarding mDNS.

they will be Openthread and not zigbee, probably harmless.

Drealine commented 11 months ago

they will be Openthread and not zigbee, probably harmless.

Tk! I've got 7.3.0.0 build 0 for multipan I've got 7.3.0.0 build 131 for ZCP firmware Maybe relative to the problem no ?

Drealine commented 11 months ago

It's very hard to debug the problem because I don't have logs in Z2M and multi pan addon.

Drealine commented 11 months ago

Ok I think I found the problem. I've disabled OpenThread protocol in Multipan addon in HA and after that, update device state is fast. Possible an interferance regarding two protocol ?

darkxst commented 11 months ago

You need to ensure that both Zigbee and Thread are using the same channel, if they are different that would be the cause.

You can check the Thread channel with Thread integration -> Configure -> then click info icon. I think it will use 15 by default. As you are using Z2M there is no way for it to automatically pick up the zigbee channel.

Either change the Zigbee channel to match Thread or change the thread channel manually, see: https://github.com/home-assistant/addons/issues/3124#issuecomment-1630897233

Drealine commented 11 months ago

How, tk again! I didn't suspect a channel problem, I thought that ZigBee and Thread could be in separate channel but not!

I need to check that tonight.

darkxst commented 11 months ago

not when both are sharing the same radio!

When using ZHA you get log messages indicating a channel mismatch, but with Z2M well HA doesnt know much about it!

Drealine commented 11 months ago

Ok so change channel of Thread is not comfortable. Need to stop Z2M. Start ZHA and keep radio configuration, reset border router and Thread to say to HA to change the channel to 25 (I use this channel in Z2M). Now, I can delete ZHA integration and restart Z2M.

darkxst commented 11 months ago

No I dont think that will work, it used to work like that previously, but currently you can't change zigbee channel anymore from ZHA when using multiprotocol.

You can change Thread channel directly via manual command to API. See below instructions that I added on the Wiki page. Login to HA via ssh and run this.

https://github.com/darkxst/silabs-firmware-builder/wiki/RCP-Multipan-Channel-Mismatch#change-thread-channel

Drealine commented 11 months ago

Ok, tk @darkxst Would you like to keep the issue open until https://github.com/Koenkk/zigbee-herdsman/issues/738 and https://github.com/home-assistant/addons/issues/3158 are resolved ?

darkxst commented 11 months ago

no need to keep this open, I will keep an eye on the other issues.

However I suspect this is an upstream issue within the Silicon Labs Zigbeed component and not specifically Z2M or ZHA at fault.