Koenkk / Z-Stack-firmware

Compilation instructions and hex files for Z-Stack firmwares
MIT License
2.41k stars 651 forks source link

Feedback development firmware 2022/07 #383

Closed dumpfheimer closed 1 year ago

dumpfheimer commented 2 years ago

After seeing in the changelog that the routing table sizes have increased I wanted to test the latest DEVELOPMENT firmware.

I am having issues which I believe are caused by the firmware update.

It seems to me that the firmware crashes after a few hours / an amout of requests. Unfortunately I cannot provide detailed feedback, but am glad to try with some guidance.

The first time it got stuck I did not pay a lot of attention and simply restarted everything. The second time I un- and replugged the coordinator and things recovered without any issues worth mentioning. The logs were full of messages as shown below (1). Later it changed to other error messages (2).

On the positive side: I do feel like the larger routing table might have had a positive effect on my environment. I have ~120 zigbee devices of which probably 2/3 are routers. Especially when toggling a bunch of lights at the same time I feel like it has less "hickups"

My environment: I am using a CC1352P2 launchpad with zigpy/zha/home assistant. The firware in use was https://github.com/Koenkk/Z-Stack-firmware/blob/develop/coordinator/Z-Stack_3.x.0/bin/CC1352P2_CC2652P_launchpad_coordinator_20220724.zip

Error message 1:

2022-07-26 01:06:59 ERROR (MainThread) [homeassistant.helpers.entity] Update for sensor.server_electricity_power fails
Traceback (most recent call last):
  File "/srv/homeassistant/lib/python3.10/site-packages/homeassistant/helpers/entity.py", line 514, in async_update_ha_state
    await self.async_device_update()
  File "/srv/homeassistant/lib/python3.10/site-packages/homeassistant/helpers/entity.py", line 709, in async_device_update
    raise exc
  File "/srv/homeassistant/lib/python3.10/site-packages/homeassistant/components/zha/sensor.py", line 297, in async_update
    await super().async_update()
  File "/srv/homeassistant/lib/python3.10/site-packages/homeassistant/components/zha/entity.py", line 250, in async_update
    await asyncio.gather(*tasks)
  File "/srv/homeassistant/lib/python3.10/site-packages/homeassistant/components/zha/core/channels/homeautomation.py", line 100, in async_update
    result = await self.get_attributes(attrs, from_cache=False, only_cache=False)
  File "/srv/homeassistant/lib/python3.10/site-packages/homeassistant/components/zha/core/channels/base.py", line 460, in _get_attributes
    read, _ = await self.cluster.read_attributes(
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy/zcl/__init__.py", line 441, in read_attributes
    result = await self.read_attributes_raw(to_read, manufacturer=manufacturer)
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy/quirks/__init__.py", line 233, in read_attributes_raw
    results = await super().read_attributes_raw(
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy/device.py", line 291, in request
    radio_result, msg = await self._application.request(
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/zigbee/application.py", line 302, in request
    return await self._send_request(
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/zigbee/application.py", line 1161, in _send_request
    response = await self._send_request_raw(
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/zigbee/application.py", line 1047, in _send_request_raw
    self._znp.request_callback_rsp(
AttributeError: 'NoneType' object has no attribute 'request_callback_rsp'

Error message 2:


2022-07-26 01:10:04 ERROR (MainThread) [zigpy_znp.zigbee.application] Failed to reconnect
Traceback (most recent call last):
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/api.py", line 652, in _skip_bootloader
    result = await responses.get()
  File "/usr/lib/python3.10/asyncio/queues.py", line 159, in get
    await getter
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/zigbee/application.py", line 886, in _reconnect
    await self.connect()
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/zigbee/application.py", line 111, in connect
    await znp.connect()
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/api.py", line 694, in connect
    self.capabilities = (await self._skip_bootloader()).Capabilities
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/api.py", line 651, in _skip_bootloader
    async with async_timeout.timeout(CONNECT_PROBE_TIMEOUT):
  File "/srv/homeassistant/lib/python3.10/site-packages/async_timeout/__init__.py", line 129, in __aexit__
    self._do_exit(exc_type)
  File "/srv/homeassistant/lib/python3.10/site-packages/async_timeout/__init__.py", line 212, in _do_exit
    raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError
2022-07-26 01:10:19 ERROR (MainThread) [zigpy_znp.zigbee.application] Failed to reconnect
Traceback (most recent call last):
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/api.py", line 652, in _skip_bootloader
    result = await responses.get()
  File "/usr/lib/python3.10/asyncio/queues.py", line 159, in get
    await getter
asyncio.exceptions.CancelledError
Koenkk commented 2 years ago

Lets discuss the flashing issues in https://github.com/Koenkk/Z-Stack-firmware/issues/397

pannal commented 2 years ago

I'll have to check logs later, but it seems like my Zigbee stick crashed completely after a couple of days (ZHA).

Same here. Crashed with CC2652R_coordinator_20221102.hex on zzh! after a few days. No error messages in log, only that zigbee2mqtt can't connect to the stick after a restart.

Reverting back to 20220928.hex and will retest.

Edit: These are the last lines in the log after/around the failure time:

zigbee           | Zigbee2MQTT:debug 2022-11-09 11:16:19: Publishing 'set' 'state' to 'LichtBuero1'
zigbee           | Zigbee2MQTT:error 2022-11-09 11:17:41: Publish 'set' 'state' to 'LichtBuero1' failed: 'Error: Command 0x5c0272fffe2b6b3f/1 genOnOff.off({}, {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (SRSP - AF - dataRequest after 6000ms)'
zigbee           | Zigbee2MQTT:debug 2022-11-09 11:17:41: Error: Command 0x5c0272fffe2b6b3f/1 genOnOff.off({}, {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":false,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (SRSP - AF - dataRequest after 6000ms)
zigbee           |     at Timeout._onTimeout (/app/node_modules/zigbee-herdsman/src/utils/waitress.ts:64:35)
zigbee           |     at listOnTimeout (node:internal/timers:559:17)
zigbee           |     at processTimers (node:internal/timers:502:7)
zigbee           | Zigbee2MQTT:info  2022-11-09 11:17:41: MQTT publish: topic 'zigbee2mqtt/bridge/log', payload '{"message":"Publish 'set' 'state' to 'LichtBuero1' failed: 'Error: Command 0x5c0272fffe2b6b3f/1 genOnOff.off({}, {\"sendWhen\":\"immediate\",\"timeout\":10000,\"disableResponse\":false,\"disableRecovery\":false,\"disableDefaultResponse\":false,\"direction\":0,\"srcEndpoint\":null,\"reservedBits\":0,\"manufacturerCode\":null,\"transactionSequenceNumber\":null,\"writeUndiv\":false}) failed (SRSP - AF - dataRequest after 6000ms)'","meta":{"friendly_name":"LichtBuero1"},"type":"zigbee_publish_error"}'
zigbee           | Zigbee2MQTT:debug 2022-11-09 11:19:20: Saving state to file /app/data/state.json
zigbee           | Zigbee2MQTT:debug 2022-11-09 11:24:20: Saving state to file /app/data/state.json
zigbee           | Zigbee2MQTT:debug 2022-11-09 11:29:20: Saving state to file /app/data/state.json
zigbee           | Zigbee2MQTT:debug 2022-11-09 11:34:20: Saving state to file /app/data/state.json
zigbee           | Zigbee2MQTT:debug 2022-11-09 11:39:20: Saving state to file /app/data/state.json
zigbee           | Zigbee2MQTT:debug 2022-11-09 11:44:20: Saving state to file /app/data/state.json
zigbee           | Zigbee2MQTT:debug 2022-11-09 11:49:20: Saving state to file /app/data/state.json
zigbee           | Zigbee2MQTT:debug 2022-11-09 11:54:20: Saving state to file /app/data/state.json
zigbee           | Zigbee2MQTT:debug 2022-11-09 11:59:20: Saving state to file /app/data/state.json
zigbee           | Zigbee2MQTT:debug 2022-11-09 12:04:20: Saving state to file /app/data/state.json

Trying to restart z2m:

zigbee           | Zigbee2MQTT:debug 2022-11-09 14:22:38: Using zigbee-herdsman with settings: '{"adapter":{"concurrent":null,"delay":null,"disableLED":true},"backupPath":"/app/data/coordinator_backup.json","databaseBackupPath":"/app/data/database.db.backup","databasePath":"/app/data/database.db","network":{"channelList":[25],"extendedPanID":[221,221,221,221,221,221,221,221],"networkKey":"HIDDEN","panID":6754},"serialPort":{"adapter":"auto","path":"/dev/ttyUSB0","rtscts":false}}'
zigbee           | Zigbee2MQTT:error 2022-11-09 14:22:59: Error while starting zigbee-herdsman
zigbee           | Zigbee2MQTT:error 2022-11-09 14:22:59: Failed to start zigbee
zigbee           | Zigbee2MQTT:error 2022-11-09 14:22:59: Check https://www.zigbee2mqtt.io/guide/installation/20_zigbee2mqtt-fails-to-start.html for possible solutions
zigbee           | Zigbee2MQTT:error 2022-11-09 14:22:59: Exiting...
zigbee           | Zigbee2MQTT:error 2022-11-09 14:22:59: Error: Failed to connect to the adapter (Error: SRSP - SYS - ping after 6000ms)
zigbee           |     at ZStackAdapter.start (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/zStackAdapter.ts:103:27)
zigbee           |     at Controller.start (/app/node_modules/zigbee-herdsman/src/controller/controller.ts:132:29)
zigbee           |     at Zigbee.start (/app/lib/zigbee.ts:58:27)
zigbee           |     at Controller.start (/app/lib/controller.ts:101:27)
zigbee           |     at start (/app/index.js:109:5)
zigbee           | Using '/app/data' as data directory

It happens that single devices time out at times, which can always be fixed by rejoining them. This time none were available anymore.

Koenkk commented 2 years ago

@pannal

Reverting back to 20220928.hex and will retest.

Let me know if you also get a crash with this fw. Previously we only got crashes with ZHA and not with Z2M.

alexruffell commented 2 years ago

@Koenkk While researching why my Smartthings Buttons (Zigbee) output 3 identical commands instead of 1, I saw these other errors that may be related to this firmware:

Logger: homeassistant.components.zha.core.gateway
Source: components/zha/core/gateway.py:172
Integration: Zigbee Home Automation ([documentation](https://www.home-assistant.io/integrations/zha), [issues](https://github.com/home-assistant/home-assistant/issues?q=is%3Aissue+is%3Aopen+label%3A%22integration%3A+zha%22))
First occurred: 09:29:51 (1 occurrences)
Last logged: 09:29:51

Couldn't start ZNP = Texas Instruments Z-Stack ZNP protocol: CC253x, CC26x2, CC13x2 coordinator (attempt 1 of 3)
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/zigpy_znp/api.py", line 998, in request
    response = await response_future
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/zha/core/gateway.py", line 172, in async_initialize
    self.application_controller = await app_controller_cls.new(
  File "/usr/local/lib/python3.10/site-packages/zigpy/application.py", line 144, in new
    await app.startup(auto_form=auto_form)
  File "/usr/local/lib/python3.10/site-packages/zigpy/application.py", line 124, in startup
    await self.connect()
  File "/usr/local/lib/python3.10/site-packages/zigpy_znp/zigbee/application.py", line 106, in connect
    await znp.connect()
  File "/usr/local/lib/python3.10/site-packages/zigpy_znp/api.py", line 706, in connect
    self.capabilities = (await self._skip_bootloader()).Capabilities
  File "/usr/local/lib/python3.10/site-packages/zigpy_znp/api.py", line 686, in _skip_bootloader
    return await self.request(c.SYS.Ping.Req())
  File "/usr/local/lib/python3.10/site-packages/zigpy_znp/api.py", line 994, in request
    async with async_timeout.timeout(
  File "/usr/local/lib/python3.10/site-packages/async_timeout/__init__.py", line 129, in __aexit__
    self._do_exit(exc_type)
  File "/usr/local/lib/python3.10/site-packages/async_timeout/__init__.py", line 212, in _do_exit
    raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError

and

Logger: zigpy.application
Source: /usr/local/lib/python3.10/site-packages/zigpy/application.py:127
First occurred: 09:29:51 (1 occurrences)
Last logged: 09:29:51

Couldn't start application
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/zigpy_znp/api.py", line 998, in request
    response = await response_future
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/zigpy/application.py", line 124, in startup
    await self.connect()
  File "/usr/local/lib/python3.10/site-packages/zigpy_znp/zigbee/application.py", line 106, in connect
    await znp.connect()
  File "/usr/local/lib/python3.10/site-packages/zigpy_znp/api.py", line 706, in connect
    self.capabilities = (await self._skip_bootloader()).Capabilities
  File "/usr/local/lib/python3.10/site-packages/zigpy_znp/api.py", line 686, in _skip_bootloader
    return await self.request(c.SYS.Ping.Req())
  File "/usr/local/lib/python3.10/site-packages/zigpy_znp/api.py", line 994, in request
    async with async_timeout.timeout(
  File "/usr/local/lib/python3.10/site-packages/async_timeout/__init__.py", line 129, in __aexit__
    self._do_exit(exc_type)
  File "/usr/local/lib/python3.10/site-packages/async_timeout/__init__.py", line 212, in _do_exit
    raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError

The complete logs with ZHA set to debug: home-assistant.log

The device sending 3 copies of the same command is 0x6573 however all the Smartthings Buttons I have started doing it at the same time. I opened a ticket on ZHA thinking it is a quirk issue, but then found the errors above so I am reporting them here just in case they are firmware related.

casakampa commented 2 years ago

I've flashed my ZZH! stick to the CC2652R_coordinator_20221102.hex firmware last Sunday and haven't had any problems at all. Tried other beta firmwares, but the last version that was stable before this one was CC2652R_coordinator_20220219.hex.

The stick is attached to an HassOS VM with USB passthrough in ESXi 7, and these are the versions of Home Assistant:

pannal commented 2 years ago

@pannal

Reverting back to 20220928.hex and will retest.

Let me know if you also get a crash with this fw. Previously we only got crashes with ZHA and not with Z2M.

zzh! died 4 days after reverting back to 20220928. Going back to CC2652R_coordinator_20220219.hex.

Edit: @Koenkk not sure if you've seen this, as I haven't quoted you directly.

Koenkk commented 2 years ago

@pannal does 20220219 work correctly?

pannal commented 2 years ago

@pannal does 20220219 work correctly?

Up til now yes, issue-free. With the latest two firmwares it took about four days to crash, I'm at day eight now so I'd say this is stable.

Koenkk commented 2 years ago

@pannal can you provide the herdsman debug log from starting z2m until it fails with the 20221102 firmware?

See https://www.zigbee2mqtt.io/guide/usage/debug.html on how to enable the herdsman debug logging. Note that this is only logged to STDOUT and not to log files.

pannal commented 2 years ago

@pannal can you provide the herdsman debug log from starting z2m until it fails with the 20221102 firmware?

See https://www.zigbee2mqtt.io/guide/usage/debug.html on how to enable the herdsman debug logging. Note that this is only logged to STDOUT and not to log files.

I can't destabilize the network for the next two weeks. If still necessary afterwards, I'll do that.

mikhail-nikolaenko commented 2 years ago

I think in UniFlash they have fixed. Flash Programmer 2 is not.

https://www.ti.com/tool/download/UNIFLASH/8.1.1 ->

What's new Fixed error when loading programs in Intel Hex format

ellnic commented 2 years ago

I've only just seen the 20221102 firmware in one of the recent changelogs but I have flashed my CC2652R Launchpad this evening. I will update if I experience issues. My network is, as a general rule, now quite stable since I have removed all Ikea E1743/E1524, but I will be interested to see if this improves the occasional bit of latency I see, and maybe some of the motion sensor red LEDs (yes, I still get them but I haven't had a single one fall off since ditching Ikea battery powered devices). I only have 67 devices, not the 80+ as per release notes, but I am interested to see if anything changes. :)

pannal commented 1 year ago

@pannal does 20220219 work correctly?

Up til now yes, issue-free. With the latest two firmwares it took about four days to crash, I'm at day eight now so I'd say this is stable.

Update: 16 days without issue. I think 20220219 is stable. I'll try to get one of the newer firmwares to crash and supply a log.

mzanetti commented 1 year ago

FWIW, I'm also experiencing crashes of the dongle with 20221102 on a CC2652RB (slae.sh) running nymea (not z2m) if the network has ~45 devices. I did also have those crashes with the previous firmware the stick came with (I believe it was a 202107...) with this amount of devices. It is very reproducible here, happens about every other day with the current setup of connected devices. At some point the stick just stops sending any data, i.e. reading the serial port just returns empty data until it is closed and re-initialized. I can't create z2m logs, but I can create logs that show all the data traffic going to/from the stick if that helps.

mrjackson commented 1 year ago

FWIW, I'm also experiencing crashes of the dongle with 20221102 on a CC2652RB (slae.sh) running nymea (not z2m) if the network has ~45 devices. I did also have those crashes with the previous firmware the stick came with (I believe it was a 202107...) with this amount of devices. It is very reproducible here, happens about every other day with the current setup of connected devices. At some point the stick just stops sending any data, i.e. reading the serial port just returns empty data until it is closed and re-initialized. I can't create z2m logs, but I can create logs that show all the data traffic going to/from the stick if that helps.

I'm running 20220928 on my slae.sh adapter, and it was crashing every other day or so until I completely powered off the pi & adapter for 30 minutes, it's been much better since, but I'm still dropping devices here and there.

pannal commented 1 year ago

@pannal can you provide the herdsman debug log from starting z2m until it fails with the 20221102 firmware?

See https://www.zigbee2mqtt.io/guide/usage/debug.html on how to enable the herdsman debug logging. Note that this is only logged to STDOUT and not to log files.

Running in DEBUG mode now on CC2652R_coordinator_20221102.hex (79 devices).

casakampa commented 1 year ago

I've flashed my ZZH! stick to the CC2652R_coordinator_20221102.hex firmware last Sunday and haven't had any problems at all. Tried other beta firmwares, but the last version that was stable before this one was CC2652R_coordinator_20220219.hex.

The stick is attached to an HassOS VM with USB passthrough in ESXi 7, and these are the versions of Home Assistant:

* Home Assistant 2022.11.2

* Supervisor 2022.10.2

* Operating System 9.3

* ZHA integration

* 37 devices: 23 routers (Hue lights and plugs) and 14 battery powered devices

To quote myself: the latest beta firmware still works very well for me. No crashes, and it works flawlessly with the latest version of Home Assistant. Everything else has remained the same.

Koenkk commented 1 year ago

For those currently experiencing crashes, can you try if the 20220507 works fine? Link: https://github.com/Koenkk/Z-Stack-firmware/tree/0cea4d898afaa26ec1fe8550fd6cd6469b332ee4/coordinator/Z-Stack_3.x.0/bin

I plan on releasing a new fw version based on an older SDK 6_10_00_29 (= 20220507) or 5.40.00.40 (= 20220219) with the increased routing tables.

TheJulianJES commented 1 year ago

I plan on releasing a new fw version based on an older SDK 6_10_00_29 (= 20220507) or 5.40.00.40 (= 20220219) with the increased routing tables.

That would be awesome. At least for me, that's the firmware I keep reverting back to, as it seems to be stable. (ZHA, newer firmwares crash after a couple of days)

Koenkk commented 1 year ago

@TheJulianJES 20220507 or 20220219? If 20220219, can you test with 20220507

TheJulianJES commented 1 year ago

@Koenkk Yeah, I meant the version based on 6.10 (20220507). At least for me, that version is the latest stable version (6.20 and later breaks for me)

artist67 commented 1 year ago

I see crashes once every week on 20221102 on a Sonoff dongle. Restarting z2m resolves the problem. Therefore, I would be happy with an upgrade version with an older SDK.

20220507 was rock stable but suffered from the small routing table for my ~150 devices.

waihsing commented 1 year ago

So far 20221102 have been stable for my SONOFF Zigbee 3.0 USB Dongle Plus ZBDongle-P. I have 13 Sonoff ZBMini that were having timeout issues with 20220219 but works flawlessly with 20221102 now. I have 53 end devices and 34 routers.

Koenkk commented 1 year ago

@TheJulianJES @artist67 can you try the 20221214 firmware: https://github.com/Koenkk/Z-Stack-firmware/tree/6.10.01.01/coordinator/Z-Stack_3.x.0/bin ? The routing tables are not as big as with 20221102 (due to a bug in this SDK) but it has all the other improvements.

dkwireless commented 1 year ago

Already flashed and running. Let's hope for the best.

artist67 commented 1 year ago

Up and running: info 2022-12-14 18:02:10: Starting Zigbee2MQTT version 1.28.4-dev (commit #21a30fc5) info 2022-12-14 18:02:10: Starting zigbee-herdsman (0.14.81) info 2022-12-14 18:02:52: zigbee-herdsman started (restored) info 2022-12-14 18:02:52: Coordinator firmware version: '{"meta":{"maintrel":1,"majorrel":2,"minorrel":7,"product":1,"revision":20221214,"transportrev":2},"type":"zStack3x0"}' info 2022-12-14 18:02:52: Currently 133 devices are joined:

sjorge commented 1 year ago

Also upgraded 🤞

w3host commented 1 year ago

I have also crashes using zzh! adapter with 20221102 firmware.

2022-12-08 07:05:45: Zigbee2MQTT started! 2022-12-09 23:03:08: crash happened.

Zigbee2MQTT version: 1.28.4 commit: 52e545f

Router: 95 End devices: 32

20220219 was OK but loss devices randomly (availability offline).

I have upgraded to 20221214. Let's see... Thanks a lot!

Nik71git commented 1 year ago

mine: Sonoff 3.0 dongle model P coordinator fw version 20220219 Sonoff 3.0 dongle model P router fw version 20221102 gateway: ZHA had problem with TI flasher but done via python script zigbee network is stable from days without problems. Thanks @Koenkk for your job (also if I switch from zigbee2mqtt to ZHA while waiting zigbee2mqtt be fully integrated in HAOS...)

pannal commented 1 year ago

@pannal can you provide the herdsman debug log from starting z2m until it fails with the 20221102 firmware? See https://www.zigbee2mqtt.io/guide/usage/debug.html on how to enable the herdsman debug logging. Note that this is only logged to STDOUT and not to log files.

Running in DEBUG mode now on CC2652R_coordinator_20221102.hex (79 devices).

Hmm, I wasn't able to crash this again. Not sure why, maybe a coincidence, but the host system was upgraded to Ubuntu 22.04 LTS in the meantime. It has run for a week now without a crash; I've experienced a boatload of "hangs" (a specific router not replying for a couple of seconds); I'll be going back to CC2652R_coordinator_20220219 for the holidays. Sorry.

dumpfheimer commented 1 year ago

For all ZHA users, this is a bit of a wild guess but you could try the following setting in HA config:

zha:
  zigpy_config:
    source_routing: True

This might take some load from the controller and keep it from committing software suicide.

As a side effect this HUGELY improved performance on my large (>100) ZigBee network. I am currently working on route discovery within zigpy based on scanned topology and link quality for anyone interested in testing.

artist67 commented 1 year ago

20221214 causes again instabilities (NWK_TABLE_FULL, no network route and timeouts). This was not present in 20221102. Sonoff Stick.

I would be happy to have an improved 20221102 without the weekly crashes ;)

dkwireless commented 1 year ago

I get no network route only on those devices that previously had the same problem. Other than that firmware is stable.

sjorge commented 1 year ago

I lost 2 devices this morning. again the NWK_TABLE_FULL error, 87 total devices.

Koenkk commented 1 year ago

Good to hear the firmware doesn't crash.

I found out that the ticlang version (20221214 = ccs version) allows for bigger routing tables (like the newer firmwares I provided). Hopefully this combines the stability of 20220219 with the performance of 20221102.

Please test it: 20221220

Char-r commented 1 year ago

after flashing to 20221220 one of my Danfoss Thermostat which had disconnected came back online, let's see how long it lasts for this time (usually after a week or so I start seeing devices dropping)

sjorge commented 1 year ago

Just flashed 20221220, lets see how this one holds up.

artist67 commented 1 year ago

Also flashed 20221220. I stay optimistic!

bgreet commented 1 year ago

Flashed latest here as well, fingers crossed. Back to having some devices that would not connect to the network reconnect so thats a plus

Update: 12 hours in and have had multiple prior devices that would not connect now reconnect to network. Most include ZBMini's and a few hue lightbulbs. Using TubesZB CC2652 and have a large network of 108 routers and 14 end devices. No crashes yet like I had experienced with prior developer fork.

waihsing commented 1 year ago

So far 20221102 have been stable for my SONOFF Zigbee 3.0 USB Dongle Plus ZBDongle-P. I have 13 Sonoff ZBMini that were having timeout issues with 20220219 but works flawlessly with 20221102 now. I have 53 end devices and 34 routers.

Flashed with 20221220 and immediately got timeout errors with ZBMinis and Ikea lights. Flashed back to 20221102.

Wireheadbe commented 1 year ago

Last weekend, added two LED1924G9 to an already big zigbee network, suddenly start experiencing crashes. 83 devices in total. 81 before the weekend. Flashed 20221220 and already had a crash after a couple of hours. Let's see if I keep getting some. Using a LAUNCHXL-CC26X2R1

w3host commented 1 year ago

I had crashes with 20221214 as well:

Zigbee2MQTT version 1.28.4 (commit #52e545f) Firmware: 20221214 2022-12-15T22:00:43.070991+01:00 18087-hass-ssd npm[1830]: Error: Write 0x60a423fffeecab8e/1 genLevelCtrl({"onLevel":255}, {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (SRSP - AF - dataRequest after 6000ms) 2022-12-15T22:00:43.072133+01:00 18087-hass-ssd npm[1830]: at Timeout._onTimeout (/srv/zigbee2mqtt/node_modules/zigbee-herdsman/src/utils/waitress.ts:64:35) 2022-12-15T22:00:43.072931+01:00 18087-hass-ssd npm[1830]: at listOnTimeout (node:internal/timers:564:17) 2022-12-15T22:00:43.075025+01:00 18087-hass-ssd npm[1830]: at processTimers (node:internal/timers:507:7) 2022-12-15T22:00:43.492374+01:00 18087-hass-ssd systemd[1]: zigbee2mqtt.service: Main process exited, code=exited, status=1/FAILURE 2022-12-15T22:00:43.495791+01:00 18087-hass-ssd systemd[1]: zigbee2mqtt.service: Failed with result 'exit-code'. 2022-12-15T22:00:43.502045+01:00 18087-hass-ssd systemd[1]: zigbee2mqtt.service: Consumed 1h 15min 7.068s CPU time.

I have changed the usb autosuspend parameter to "-1" based on this FAQ (https://www.zigbee2mqtt.io/guide/faq/#zigbee2mqtt-crashes-after-some-time) and changed back the firmware to 20221102.

From december 17 it is working without any crash and I hope it remains stable...

sjorge commented 1 year ago

Hit with another wave of NWK_TABLE_FULL, no hard crash or dropped devices so far. Might to back to 20220219 over the weekend, given I will be away from home for a bit and that one was pretty stable.

Koenkk commented 1 year ago

@bgreet what was the latest working version?

@waihsing directly after flashing it might take some time to build up the new routes, give it some more time.

@sjorge maybe it's due to high route expiry timeout, what coordinator are you using? I can provide you one with lower timeouts.

sjorge commented 1 year ago

@sjorge maybe it's due to high route expiry timeout, what coordinator are you using? I can provide you one with lower timeouts.

It’s a zzhp-lite, I was wondering the same. Like once we hit the error, maybe we can dynamically lower the timeout in steps unto a set minimal, … although thats more complex than just lowering the fixed value 😅

Koenkk commented 1 year ago

@sjorge can you contact me on telegram or discord? (@koenkk)

sjorge commented 1 year ago

I don’t have telegram, but i think i am on the discord server. Let me check.

Another datapoint, my network is router heavy which might not be the norm.

By device type Router: 50 End devices: 37

Wireheadbe commented 1 year ago

Last weekend, added two LED1924G9 to an already big zigbee network, suddenly start experiencing crashes. 83 devices in total. 81 before the weekend. Flashed 20221220 and already had a crash after a couple of hours. Let's see if I keep getting some. Using a LAUNCHXL-CC26X2R1

I had a removed device still in a group definition in config. Removed it and all seems fine at the moment. Keeping an eye on it.

bgreet commented 1 year ago

@bgreet what was the latest working version?

@waihsing directly after flashing it might take some time to build up the new routes, give it some more time.

@sjorge maybe it's due to high route expiry timeout, what coordinator are you using? I can provide you one with lower timeouts.

Last firmware was 1102. Most recent 1220 has been working great so far! No drop offs and things have remained stable. Thanks for all the hard work!

Wireheadbe commented 1 year ago

Still stable on zStack3x0 - 20221220 - issue was really related to a removed device that was still in Group config (hardcoded in config)

Device by type: Router: 65 End devices: 20