Koenkk / zigbee2mqtt

Zigbee 🐝 to MQTT bridge 🌉, get rid of your proprietary Zigbee bridges 🔨
https://www.zigbee2mqtt.io
GNU General Public License v3.0
11.77k stars 1.64k forks source link

NWK_TABLE_FULL errors #4964

Closed nanobra1n closed 3 years ago

nanobra1n commented 3 years ago

After running my network for 6 days, several devices stopped responding. Error log showed dozens of errors related to setting devices and related to OTAUpdates such as:

error 2020-11-11 23:44:59: Publish 'set' 'brightness' to 'TV Light' failed: 'Error: Command 0x00124b001ef02f05/11 genLevelCtrl.moveToLevelWithOnOff({"level":127,"transtime":1980}, {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null}) failed (SREQ '--> ZDO - extRouteDisc - {"dstAddr":26082,"options":0,"radius":30}' failed with status '(0xc7: NWK_TABLE_FULL)' (expected '(0x00: SUCCESS)'))'

and

error 2020-11-12 00:50:35: Failed to call 'OTAUpdate' 'onZigbeeEvent' (Error: CommandResponse 0xe20db9fffe02953e/10 genOta.queryNextImageResponse({"status":152}, {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":1,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null}) failed (SREQ '--> ZDO - extRouteDisc - {"dstAddr":12023,"options":0,"radius":30}' failed with status '(0xc7: NWK_TABLE_FULL)' (expected '(0x00: SUCCESS)'))
    at Znp.<anonymous> (/zigbee2mqtt-1.16.1/node_modules/zigbee-herdsman/dist/adapter/z-stack/znp/znp.js:291:27)
    at Generator.next (<anonymous>)
    at fulfilled (/zigbee2mqtt-1.16.1/node_modules/zigbee-herdsman/dist/adapter/z-stack/znp/znp.js:24:58))

What happened

Restarting Zigbee2MQTT did not resolve the errors, unplugging the coordinator did

What did you expect to happen

How to reproduce it (minimal and precise)

Unsure, running the network for a week?

Debug info

Zigbee2MQTT version: 1.16.1 Adapter hardware: CC1352P-2 Adapter firmware version: zStack3x0 20201026

qosmio commented 3 years ago

Confirmed I'm get this too.

Koenkk commented 3 years ago

Before diving further, can you try first with the latest dev firmware: https://github.com/Koenkk/Z-Stack-firmware/tree/develop/coordinator/Z-Stack_3.x.0/bin ?

It contains some performance related to network parameters and increases the transmit power from 0dbm to 5dbm.

nanobra1n commented 3 years ago

I flashed Rev 20201113 and will let you know if that issue arises again.

nanobra1n commented 3 years ago

Is the red LED at DIO6 (at least that's what I think it is) supposed to be on now? Otherwise everything works

Koenkk commented 3 years ago

@nanobra1n I didn't change anything to leds, lets see if it keeps working.

nanobra1n commented 3 years ago

Running over a week now and haven't observed this again. Continuing to be on the lookout.

nanobra1n commented 3 years ago

The problem never materialized for me again after using 20201113 firmware for 5 weeks, so I am closing this.

sant1ago commented 3 years ago

Not sure if anyone else is still running into this issue - upgrading to 20201113 firmware has not worked for me. I still need to reset my coordinator every 24-48 hours . I'm running:

Z2M version: 1.16.2 Coordinator firmware: 20201113 (zStack3x0)

nanobra1n commented 3 years ago

Reopening it , not for me but for @sant1ago

sant1ago commented 3 years ago

Thanks @nanobra1n

Here's an example entry that I see come up in the logs:

error 2020-12-22 08:24:01: Publish 'set' 'position' to 'Shades - Studio' failed: 'Error: Command 0x002446000015637a/1 genLevelCtrl.moveToLevelWithOnOff({"level":"229","transtime":0}, {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null}) failed (SREQ '--> ZDO - extRouteDisc - {"dstAddr":57812,"options":0,"radius":30}' failed with status '(0xc7: NWK_TABLE_FULL)' (expected '(0x00: SUCCESS)'))'

Koenkk commented 3 years ago

@sant1ago what adapter are you using? (zzh, cc1352p launchpad,..)

sant1ago commented 3 years ago

@Koenkk I'm using ZZH.

Koenkk commented 3 years ago

Can you try with the following fw: znp_CC26X2R1_LAUNCHXL_tirtos_ccs_20201223.zip

sant1ago commented 3 years ago

Thanks @Koenkk - just loaded it into my ZZH . I'll provide an update after 24 hours

Koenkk commented 3 years ago

Good, if that doesn't help add the following to your configuration.yaml:

advanced:
  adapter_concurrent: 1
sant1ago commented 3 years ago

I've tried that previously with the 20201113 fw but I do not think it helped :(

sant1ago commented 3 years ago

Hi @Koenkk - while it seems the update appeared to provide better stability over past few days I'm still seeing the "NWK_TABLE_FULL" issue come up. Some additional observations:

  1. It only appears to affect 2 of my devices. Both happen to be AXIS Gear.
  2. Both appear to throw the error on the same message, here is a sample from my logs:

    log.txt:error 2020-12-26 14:12:13: Failed to call 'OTAUpdate' 'onZigbeeEvent' (Error: CommandResponse 0x0024460000154193/1 genOta.queryNextImageResponse({"status":152}, {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":1,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null}) failed (SREQ '--> ZDO - extRouteDisc - {"dstAddr":22534,"options":0,"radius":30}' failed with status '(0xc7: NWK_TABLE_FULL)' (expected '(0x00: SUCCESS)')) log.txt:error 2020-12-26 14:17:05: Failed to call 'OTAUpdate' 'onZigbeeEvent' (Error: CommandResponse 0x002446000015637a/1 genOta.queryNextImageResponse({"status":152}, {"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":1,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null}) failed (SREQ '--> ZDO - extRouteDisc - {"dstAddr":57812,"options":0,"radius":30}' failed with status '(0xc7: NWK_TABLE_FULL)' (expected '(0x00: SUCCESS)'))

  3. Both devices are still connected and are are populating "last_seen" value.
  4. All other devices do not appear to be affected by "NWK_TABLE_FULL" error.
  5. Both affected devices do not accept any commands submitted via Z2M when I start seeing "NWK_TABLE_FULL"
  6. I actually have 3 AXIS gear devices - the 3rd done does not appear to be affected (yet)

Hope this helps - let me know if you need more information.

Koenkk commented 3 years ago

@sant1ago could you provide the full herdsman debug log from starting zigbee2mqtt till these errors appear?

To enable herdsman debug logging, see https://www.zigbee2mqtt.io/information/debug.html#zigbee-herdsman-debug-logging

sant1ago commented 3 years ago

@Koenkk I've just enabled it and restarted Z2M - once I hit the error again I'll upload here

Koenkk commented 3 years ago

@sjorge since you also got this (https://github.com/Koenkk/zigbee2mqtt/issues/5431) can you provide the logging as indicated here: https://github.com/Koenkk/zigbee2mqtt/issues/4964#issuecomment-751463379

sjorge commented 3 years ago

Of course with the debug wheels on... it doesn't happen.

sjorge commented 3 years ago

Interesting, I had a few minute while waiting on something and I toggled the power of the bedroom a few times.

First time I had the NWK_TABLE_FULL, but I wasn't running with debug logging. Did it again with debug logging and out of the 3 times I did I got an error 2/3... but it is a different error!

https://gist.github.com/sjorge/97d8d1ac620fa2ad682bab4156ef027a

I did it one more time without debug logging and got NWK_TABLE_FULL again. I'm was not going to post, but maybe they are related as both indicate something is not draining, with debug logging on I get a BUFFER_FULL instead.

Koenkk commented 3 years ago

Still looks like it has something to do with the buffers. I've quadrupled them in this firmware: zzhplite_increased_buffs_20201229.hex.zip (note fw only works on the zzhp lite, which is not available for sale yet :))

sjorge commented 3 years ago

Still looks like it has something to do with the buffers. I've quadrupled them in this firmware: zzhplite_increased_buffs_20201229.hex.zip (note fw only works on the zzhp lite, which is not available for sale yet :))

Will flash tonight.

My reasoning for still posting was that... NWK_TABLE_FULL could be because a command to remove/add/rediscover the route got dropped due to a BUFFER_FULL somewhere.

sant1ago commented 3 years ago

@Koenkk - my logs attached. Shall I also try the fw w/ the increased buffers ? logs.zip

sjorge commented 3 years ago

Shall I also try the fw w/ the increased buffers ?

(note fw only works on the zzhp lite, which is not available for sale yet :))

Unless you are running on prototype zzhp-lite hardware like me, no.

sjorge commented 3 years ago

debug.log

Flashed the new firmware and toggled the power to all my lights, hit a few of them. Did not hit a BUFFER_FULL though.

Koenkk commented 3 years ago

@sjorge a bit off topic but it may fix the issue. While refactoring the Ikea reconfigure reporting stuff I found out that a rebind is not required, only a configure reporting. This is done now (https://github.com/Koenkk/zigbee-herdsman-converters/commit/5f99e26eda41baad0d692eec397816168808ad5d / 3558635797372cb80ef28361757ab158d2aece18). So instead of 2, 1 request is done now. Can you check if you still get the issue now?

EDIT: did another improvement. The NWK_TABLE_FULL likely happens due too many route discoveries. This are currently executed after every device announce. These route discoveries have been added to fix the Hue motion disconnect issue. Therefore these will only be executed for end devices now.

Koenkk commented 3 years ago

@Koenkk - my logs attached. Shall I also try the fw w/ the increased buffers ? logs.zip

Thanks for the logs but it does not contain the herdsman debug logging. Note that this is only logged to the STDOUT, not to the log files

sant1ago commented 3 years ago

@Koenkk - oops sorry missed that part. I'll get STDOUT for you

sjorge commented 3 years ago

@sjorge a bit off topic but it may fix the issue. While refactoring the Ikea reconfigure reporting stuff I found out that a rebind is not required, only a configure reporting. This is done now (Koenkk/zigbee-herdsman-converters@5f99e26 / 3558635). So instead of 2, 1 request is done now. Can you check if you still get the issue now?

Doing less actions/traffic when not needed is certainly good 👍 Will upgrade to dev in a bit.

EDIT: did another improvement. The NWK_TABLE_FULL likely happens due too many route discoveries. This are currently executed after every device announce. These route discoveries have been added to fix the Hue motion disconnect issue. Therefore these will only be executed for end devices now.

Isn't the table on the CC2652P suppose to be rather big? I have ~65 devices so I think that would allow a route for each. Or does it keep multiple? (Speaking of which, lots of disconnected nodes in the network map on the frontend, more so than with the CC2531 source routing firmware).

Although my own graphviz map shows everything connected, so maybe it's an issue with the frontend. Although it seems my SNZB-04 do randomly get disconnected more now, 2 open/close actions pops them back in, so it probably explains why my battery info is not updating on those.

sjorge commented 3 years ago

@Koenkk the stick crashed overnight it seems and somehow some devices also got kicked out. But sort of not? They are still in db, still work but bind/reporting seems to be gone, as are the group memberships (although those still show up in the frontend)

Koenkk commented 3 years ago

@sjorge can you provide the herdsman debug logs of this?

sjorge commented 3 years ago

Sadly no, I was not running with debug logging, even the regular logs were already rotated. On the up side, I was running with debug logging when repairing everything and did not have a single BUFFLER_FULL and NWK_TABLE_FULL error!

Koenkk commented 3 years ago

@sjorge good, can you try reverting to the normal develop firmware? Since less requests are done it may also work with that now (and maybe this new firmware is the cause of your coordinator crash)

sjorge commented 3 years ago

Will flash after lunch before I go back to work, pretty easy now with cc*-bsl :)

sant1ago commented 3 years ago

@Koenkk - here are the logs from STDOUT: Z2M.log.zip

Koenkk commented 3 years ago

@sant1ago does the issue still appear if you disable the availability feature? (remove availability_timeout from your configuration.yaml)

sant1ago commented 3 years ago

@Koenkk - I just set it, will let you know if it resolves the issue.

sant1ago commented 3 years ago

Morning @Koenkk - unfortunately that did not resolve the issue. Anything else I can try ? output.log.zip

Koenkk commented 3 years ago

@sant1ago I notice that there are a lot of 225 (=MAC_CHANNEL_ACCESS_FAILURE error) in your log (search for regex Data confirm error .*,225). This happens when there is a lot of interference (and may be the root cause of your errors).

sant1ago commented 3 years ago

Thanks for checking the logs - my responses below:

  • Is the adapter connected through a USB extension cable and positioned away from your computer/wifi router? (20cm +)

Yes to extension cable and > 20 cm from Pi (the computer running Z2M) . I also have a ZWave interface in the same room but also > 20cm away from my ZZH and on another separate Pi - would that matter ?

Sure thing - let me try this and see if that reduces the MAC_CHANNEL_ACCESS_FAILURE error.

sant1ago commented 3 years ago

@Koenkk update - I've had Z2M up for ~ 3 days and the error has only come up 3 times vs 1305 times previously. It appears the channel change has helped immensely !

If I may sugggest adding this to the docs in case someone else runs into this issue they know what to do (NWK_TABLE_FULL errors after updating FW) ?

Koenkk commented 3 years ago

I've added a "common error codes" section to the faq with this included. (https://www.zigbee2mqtt.io/information/FAQ.html)

Can this issue be closed?

sant1ago commented 3 years ago

Perfect - thanks @Koenkk !

@nanobra1n do you mind closing the ticket?

salopette commented 3 years ago

Error Publish 'set' 'led_indication' to 'BWM Bad' failed: 'Error: Write 0x0017880106f48c37/2 genBasic({"51":{"value":false,"type":16}}, {"sendWhenActive":false,"timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":4107,"transactionSequenceNumber":null,"writeUndiv":false}) failed (SREQ '--> ZDO - extRouteDisc - {"dstAddr":49127,"options":0,"radius":30}' failed with status '(0xc7: NWK_TABLE_FULL)' (expected '(0x00: SUCCESS)'))'

Marc-David commented 2 years ago

I struggled with this error after upgrading my CC2531 device to a CC2652 device. I tried the following:

The availability checks gave me the clear hint that most of my routers were not connected. After resetting them everything started working again.

Hope this helps someone.

Orochica commented 1 year ago

I get this error also: ing 'SonoffRouter2' (attempt 2/2, Read 0x00124b00258d5691/8 genBasic(["zclVersion"], {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (SREQ '--> ZDO - extRouteDisc - {"dstAddr":61025,"options":0,"radius":30}' failed with status '(0xc7: NWK_TABLE_FULL)' (expected '(0x00: SUCCESS)')))