Koenkk / Z-Stack-firmware

Compilation instructions and hex files for Z-Stack firmwares
MIT License
2.33k stars 643 forks source link

20221102 zstack 3.x.0 coordinator build throws NWK_TABLE_FULL after ~7 days #402

Closed sjorge closed 1 year ago

sjorge commented 1 year ago

Had it 3 times now that after about a week some device drop of the network (and sometimes even vanish from the database if not caught quickly).

It seems to be because NWK_TABLE_FULL errors, I usually toggle my lights in blocks with at the breaker to get them back. Then when they join again I sometimes (but not for every bulb) get:

error 2022-12-05 20:23:04: Error: ConfigureReporting 0x680ae2fffe11ed6f/1 lightingColorCtrl([{"attribute":"currentY","minimumReportInterval":3,"maximumReportInterval":3600,"reportableChange":1}], {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (SREQ '--> ZDO - extRouteDisc - {"dstAddr":20051,"options":0,"radius":30}' failed with status '(0xc7: NWK_TABLE_FULL)' (expected '(0x00: SUCCESS)'))

When the device go offline themselves it's just a availability ping failure, no other error and online device still work fine for a bit.

Unplugging the coordinator stick for a few minutes and plugging it back in once the network is mostly back seems to help.

I'm using a zzhp and my mesh is rather dense with a lot of router but I am still below (85) 100 devices. (I had over 100 before without issues on the same stick).

While the devices are offline, usually buttons controlling unavailable devices work fine still. So I think it's just the coordinator running out of space and not having a route to the device.

image

I did notice on this firmware the coordinator has way more lines attached than before. Before it would have like 8 ish lines to routers and then they'd mesh.

Sadly frontend can't show the source routing map and it's too big to draw manually with graphvis :(

sjorge commented 1 year ago

Attached is the graphviz data for the map, but I never managed to get it to draw without it OOM'ing after it exhausts 32G of memory.

map.dot.txt

Edit: I did manage to render it with 64G memory https://drive.google.com/file/d/1H53a1fjodf3NlWzhUE9D4t4Xt4af2_ck/view

Edit 2: this is a fresh map with the 20220219 firmware, it seems to have less connections from the coordinator, I wonder if that is also why I was able to render this one 🤔

sjorge commented 1 year ago

Going to revert back to the previous firmware CC1352P2_CC2652P_other_coordinator_20220219 the issue seems similar enough to https://github.com/Koenkk/Z-Stack-firmware/issues/383 and that one mentioned 20220219 as the last known good, and that was also the one I was running before the issue started.

Koenkk commented 1 year ago

Did the issue also occur with 20220219?

sjorge commented 1 year ago

I dont’t remember it happening before upgrading, and that was the newest firmware i had in my downloads folder before 20221102, so i flashed that one again yesterday.

I guess if the mesh stays up for more than a week we’ll know.

ellnic commented 1 year ago

I've just lost a Hue motion sensor after 5 days on 20221102. Haven't had any issues on 20220219 since removing Ikea battery powered devices. I'll stay on 20221102 for the time being to see if it replicates.

Edit: I've just checked the logs and I see no mention of NWK_TABLE_FULL so whatever threw my motion sensor isn't the same as above.

sjorge commented 1 year ago

So far it seems stable at 4 days, i had to reboot the node so not made it to ~7 yet. (With the old firmware)

sjorge commented 1 year ago

Everything still good on 20220219, guess i’ll be sticking to this one for a bit longer.

Koenkk commented 1 year ago

I'll try to publish a new fw this week, see https://github.com/Koenkk/Z-Stack-firmware/issues/383#issuecomment-1345547988

Jabe commented 1 year ago

Same issue with 20221102. NWK_TABLE_FULL after a good amount of days. Lost 3 devices. One I was able to rejoin no problem but two only after downgrading to 20220219. I have 50 routers and 33 end devices, so I will try the new version when it's out!

Koenkk commented 1 year ago

try with 20221214: https://github.com/Koenkk/Z-Stack-firmware/issues/383#issuecomment-1351696706

I'll close this thread, lets continue in https://github.com/Koenkk/Z-Stack-firmware/issues/383

Koenkk commented 1 year ago

Update: let's continue in Let's continue in https://github.com/Koenkk/Z-Stack-firmware/issues/439