Koenkk / Z-Stack-firmware

Compilation instructions and hex files for Z-Stack firmwares
MIT License
2.41k stars 651 forks source link

Feedback development firmware 2022/07 #383

Closed dumpfheimer closed 1 year ago

dumpfheimer commented 2 years ago

After seeing in the changelog that the routing table sizes have increased I wanted to test the latest DEVELOPMENT firmware.

I am having issues which I believe are caused by the firmware update.

It seems to me that the firmware crashes after a few hours / an amout of requests. Unfortunately I cannot provide detailed feedback, but am glad to try with some guidance.

The first time it got stuck I did not pay a lot of attention and simply restarted everything. The second time I un- and replugged the coordinator and things recovered without any issues worth mentioning. The logs were full of messages as shown below (1). Later it changed to other error messages (2).

On the positive side: I do feel like the larger routing table might have had a positive effect on my environment. I have ~120 zigbee devices of which probably 2/3 are routers. Especially when toggling a bunch of lights at the same time I feel like it has less "hickups"

My environment: I am using a CC1352P2 launchpad with zigpy/zha/home assistant. The firware in use was https://github.com/Koenkk/Z-Stack-firmware/blob/develop/coordinator/Z-Stack_3.x.0/bin/CC1352P2_CC2652P_launchpad_coordinator_20220724.zip

Error message 1:

2022-07-26 01:06:59 ERROR (MainThread) [homeassistant.helpers.entity] Update for sensor.server_electricity_power fails
Traceback (most recent call last):
  File "/srv/homeassistant/lib/python3.10/site-packages/homeassistant/helpers/entity.py", line 514, in async_update_ha_state
    await self.async_device_update()
  File "/srv/homeassistant/lib/python3.10/site-packages/homeassistant/helpers/entity.py", line 709, in async_device_update
    raise exc
  File "/srv/homeassistant/lib/python3.10/site-packages/homeassistant/components/zha/sensor.py", line 297, in async_update
    await super().async_update()
  File "/srv/homeassistant/lib/python3.10/site-packages/homeassistant/components/zha/entity.py", line 250, in async_update
    await asyncio.gather(*tasks)
  File "/srv/homeassistant/lib/python3.10/site-packages/homeassistant/components/zha/core/channels/homeautomation.py", line 100, in async_update
    result = await self.get_attributes(attrs, from_cache=False, only_cache=False)
  File "/srv/homeassistant/lib/python3.10/site-packages/homeassistant/components/zha/core/channels/base.py", line 460, in _get_attributes
    read, _ = await self.cluster.read_attributes(
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy/zcl/__init__.py", line 441, in read_attributes
    result = await self.read_attributes_raw(to_read, manufacturer=manufacturer)
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy/quirks/__init__.py", line 233, in read_attributes_raw
    results = await super().read_attributes_raw(
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy/device.py", line 291, in request
    radio_result, msg = await self._application.request(
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/zigbee/application.py", line 302, in request
    return await self._send_request(
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/zigbee/application.py", line 1161, in _send_request
    response = await self._send_request_raw(
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/zigbee/application.py", line 1047, in _send_request_raw
    self._znp.request_callback_rsp(
AttributeError: 'NoneType' object has no attribute 'request_callback_rsp'

Error message 2:


2022-07-26 01:10:04 ERROR (MainThread) [zigpy_znp.zigbee.application] Failed to reconnect
Traceback (most recent call last):
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/api.py", line 652, in _skip_bootloader
    result = await responses.get()
  File "/usr/lib/python3.10/asyncio/queues.py", line 159, in get
    await getter
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/zigbee/application.py", line 886, in _reconnect
    await self.connect()
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/zigbee/application.py", line 111, in connect
    await znp.connect()
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/api.py", line 694, in connect
    self.capabilities = (await self._skip_bootloader()).Capabilities
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/api.py", line 651, in _skip_bootloader
    async with async_timeout.timeout(CONNECT_PROBE_TIMEOUT):
  File "/srv/homeassistant/lib/python3.10/site-packages/async_timeout/__init__.py", line 129, in __aexit__
    self._do_exit(exc_type)
  File "/srv/homeassistant/lib/python3.10/site-packages/async_timeout/__init__.py", line 212, in _do_exit
    raise asyncio.TimeoutError
asyncio.exceptions.TimeoutError
2022-07-26 01:10:19 ERROR (MainThread) [zigpy_znp.zigbee.application] Failed to reconnect
Traceback (most recent call last):
  File "/srv/homeassistant/lib/python3.10/site-packages/zigpy_znp/api.py", line 652, in _skip_bootloader
    result = await responses.get()
  File "/usr/lib/python3.10/asyncio/queues.py", line 159, in get
    await getter
asyncio.exceptions.CancelledError
davidcoulson commented 2 years ago

I'm having this issue updating my tube-zb-gw-cc2652p2-v2 - Can flash it back to 20220219 just fine. WIll try later via usb serial instead of Ethernet.

david@djc-ubuntu1:~/cc2538-bsl$ ./cc2538-bsl.py -p socket://10.2.3.70:6638 -evw ../CC1352P2_CC2652P_launchpad_coordinator_20220928.hex Opening port socket://10.2.3.70:6638, baud 500000 Reading data from ../CC1352P2_CC2652P_launchpad_coordinator_20220928.hex Your firmware looks like an Intel Hex file Connecting to target... CC1350 PG2.0 (7x7mm): 352KB Flash, 20KB SRAM, CCFG.BL_CONFIG at 0x00057FD8 Primary IEEE Address: 00:12:4B:00:25:8F:28:85 Performing mass erase Erasing all main bank flash sectors Erase done Writing 360448 bytes starting at address 0x00000000 ERROR: Timeout waiting for ACK/NACK after 'Get Status (0x23)'

timdonovanuk commented 2 years ago

Updated to CC2652R_coordinator_20220928. Last week several of my devices just started vanishing offline, mostly worryingly a sonoff-basiczbr3 which are pretty rock solid. Will report back!

patatman commented 2 years ago

I upgraded briefly this evening. But it introduced a huge delay in the network. Some lights would take 1 - 2 seconds before turning on. Also getting errors in the logs like No network route' (205)

Z2M version: 1.28.0 Running on a Sonoff Zigbee 3.0 p version. Mainly Tradfri GUI10 devices Total 76 devices Router: 38

I've reverted back to the stable firmware (CC1352P2_CC2652P_other_coordinator_20220219). But if there is something you'd like to know or want me to test let me know. Happy to test and debug.

alexruffell commented 2 years ago

I am using a Sonoff Zigbee 3.0 USB dongle with the TI chip and have 90 devices on Home Assistant / ZHA. I updated it to the latest dev firmware the day it was released and everything appears to be working quite well. I only have a Jasco outlet that loves to drop off and come back but it has always been troublesome. Before this version I had updated to the prior dev release which allowed my controller to establish a direct link to way more devices than the prior 23 or so. This version maintained that. Having more direct connections seems to be helping with stability.

image
cadavre commented 2 years ago

I've noticed that my coordinator firmware was outdated (202101..) so decided to go with this bleeding-edge version. Unfortunately all my devices disappear after few minutes.

My network isn't particularly big:

End devices: 20
Router: 9

There's not a single line of log in the Z2M.

I'll go with 202205.. now.

Wireheadbe commented 2 years ago

zStack3x0 / 20220928 / 81 devices (Ikea/Sonoff/Lidl/Hue) / CC2652R_ / zigbee2mqtt as separate container

.. no issues 👍🏻

KHV8 commented 2 years ago

Version 20220928 are in dev area and not master. Will it become the master or will there ever come a new master version.

I do not have any issues with 20220219, however solving not existing problems :-)

Wireheadbe commented 2 years ago

I do not have any issues with 20220219, however solving not existing problems :-)

But tinkering is half the fun (as long as your partner understands 😆 )

KHV8 commented 2 years ago

I did upgrade from 20220219 to 20220928 on a Zonoff 3.0 dongle. The upgrade worked with not problems at all, and initially everything was fine.

My setup: A RPi4 running HAOS with Z2M as an add-on and add-on MQTT. Everything on latest version.

My experience: When I turn on a HA light scene consisting of 6 lights (I do not have any bigger) then Z2M stops working, and within a minute HA also stops working and finally reboots. After the reboot everything is working again. I can change single lights and other stuff with no problem, even a light group consisting of 3 HA lights. Next time I activate the big group of 6 HA lights it fails again.

I do not have anything in the logs. Maybe if I increase the log level, however have not done that. If needed I will try next weekend.

I have downgraded to 20220219 and it works again.

alexruffell commented 2 years ago

@KHV8 I am using 20220928 on the same coordinator but running on ZHA. I don't have Zigbee lights (everything is behind zwave dimmers or LIFX) but I do have a scene that turns on / off 7 zigbee 3.0 plug in sockets. While they often pop corn on (very fast but I can tell), it has never introduced instability. My setup runs in a Proxmox VM on a dedicated i7 based Lenovo tinyPC. CPU horsepower and zigbee software appear to be the biggest difference between our setups.

KHV8 commented 2 years ago

@alexruffell, not sure which version you refer to, are you on the master version from feb, or on the new dev from 28 Sep.

I believe my experience are somewhat aligned with a few of the expierences above, related to strange behaviour, and people downgrading again. I do not know, however the expanded routing table might be the reason. I could see on the map that many more end-devices were moving to the coordinator as their router. This might have lead to a overload of the coordinator? Not sure.

alexruffell commented 2 years ago

@KHV8 I am on the 28 Sep dev version. Sorry... I edited my previous post to correct the version.

FirstRulez commented 2 years ago

I use the SONOFF Zigbee 3.0 USB Dongle Plus ZBDongle-P, connected to rPI4 (HAOS) running zigbee2mqtt locally. Updated to the beta firmware 20220928 and have since noticed an improvement on my zigbee mesh. previously simultaneously controlling multiple devices (not grouped) could result in 1s or 2s delays with some devices, now much more reliable/prompt. I have ~60 devices of which 24 are battery powered, the remain are all powered and are all routers. Had some real problems getting the stick into bootloader mode to perform the update, but once done the update went through easily enough. Would be great to see an update mechanism built into the software stack as installing python, all working out the module dependencies, registering for the TI flash software will be way beyond many users.

jerrm commented 2 years ago

working out the module dependencies, registering for the TI flash software will be way beyond many users

Agree native FW upgrade support would be great.

But...

Easiest way to update under HAOS is to use cc2538-bsl inside the HA container. Pre-reqs are already installed. No registration needed. No need to manually place a Sonoff stick into bl mode.

FirstRulez commented 2 years ago

cc2538-bsl inside the HA container. Pre-reqs are already installed. No registration needed. No need to manually place a Sonoff stick into bl mode.

Are there instructions for that method somewhere? I have not found/seen them

jerrm commented 2 years ago

I posted a script here: https://community.home-assistant.io/t/sonoffs-zigbee-3-0-usb-dongle-plus-firmware/420558/5

Not something I would normally release to the wild, inteneded more as private documentation than a "real" script. FW urls need to be updated manually.

kovaga commented 2 years ago

Have CC2652R board flashed with the new firmware and zigbee2mqtt v1.28

Currently 64 devices are joined, of which 21 are routers and 43 are EndDevices

Running stable for about two weeks already. However, the problem of changing settings on end devices like "Honeywell smoke detector" and "Aqara vibration sensor" resulting in timeout and "Data request failed with error: 'No network route' (205)" still persists.

FirstRulez commented 2 years ago

However, the problem of changing settings on end devices like "Honeywell smoke detector" and "Aqara vibration sensor" resulting in timeout and "Data request failed with error: 'No network route' (205)" still persists.

I've found that this problem has always been due to interference - check your WiFi channels are not overlapping with your Zigbee channel, and that your Zigbee controller is far enough away from the HASS device (normally only a problem with RPi on the USB 3 ports, but I also had a problem when I placed it too close to another PC which had USB3 devices.

kovaga commented 2 years ago

I have seen several mentioning of this issue, i.e. https://github.com/Koenkk/zigbee2mqtt/issues/4285

So tried to forcefully remove the device from the network, then re-pair, and it works for several hours, but eventually arrives to the "no route" state.

my zigbee network channel is 26, the AP that is close to the coordinator is on channel 1, while the 2nd AP that could interfere as it is on channel 11 is on 2nd floor, pretty far away. from the ground floor where the coordinator resides.

the coordinator is connected via serial cable to the ethernet2serial bridge and powered via POE.

dkwireless commented 2 years ago

I have seen several mentioning of this issue, i.e. Koenkk/zigbee2mqtt#4285

So tried to forcefully remove the device from the network, then re-pair, and it works for several hours, but eventually arrives to the "no route" state.

my zigbee network channel is 26, the AP that is close to the coordinator is on channel 1, while the 2nd AP that could interfere as it is on channel 11 is on 2nd floor, pretty far away. from the ground floor where the coordinator resides.

the coordinator is connected via serial cable to the ethernet2serial bridge and powered via POE.

Just try adding problematic devices into groups (single device to one group) and try sending command to group. It should work even if you are getting "no route" when you call device directly. I have been struggling with this problem for a while. This is only workaround that works. There is some underlying problem that I reported in separate thread. Not linked to this particular FW version.

KrzysztofHajdamowicz commented 2 years ago

Just try adding problematic devices into groups (single device to one group) and try sending command to group. It should work even if you are getting "no route" when you call device directly.

This is a workaround, because issuing a command to the group causes coordinator to emit broadcast message to whole network, where issuing a command to single device results in emit message targeted to specific device using specific route.

kovaga commented 2 years ago

Thanks for the advice, but I can't seem to add those devices to the group. If I add it via GUI, I get the timeout response

Adding 'sens_smoke2' to 'gsens_smoke02' MQTT publish: topic 'zigbee2mqtt/bridge/response/group/members/add', payload '{"data":{"device":"0x00158d0002f37664/1","group":"gsens_smoke02"},"error":"Failed to add from group (Command 0x00158d0002f37664/1 genGroups.add({\"groupid\":1,\"groupname\":\"\"}, {\"sendWhen\":\"immediate\",\"timeout\":10000,\"disableResponse\":false,\"disableRecovery\":false,\"disableDefaultResponse\":true,\"direction\":0,\"srcEndpoint\":null,\"reservedBits\":0,\"manufacturerCode\":null,\"transactionSequenceNumber\":null,\"writeUndiv\":false}) failed (Timeout - 55803 - 1 - 9 - 4 - 0 after 10000ms))","status":"error","transaction":"kdq4w-1"}'

If I manually specify it in a configuration.yaml, I get the failed response.

Adding 'sens_smoke2' to group 'gsens_smoke02' Failed to add 'sens_smoke2' from 'gsens_smoke02'

Moreover, the smoke detector lies right next to the coordinator and responds to the commands (i.e. sensitivity and selftest) after I have re-paired it. Very strange. and I see the same behaviour with the vibration sensors.

dkwireless commented 2 years ago

Try resetting device before you add it to group. You can't add it to group if there is no valid route.

kovaga commented 2 years ago

what do you mean by resetting the device? What I have done is

there seems to be a valid route, as I can change the sensitivity and execute a self test on the device, just not add it to the group.

dkwireless commented 2 years ago

That is strange. You did everything right. I don't know why you can't add device into group. This is new empty group?

kovaga commented 2 years ago

Just tried it again, with the same results :(

send selftest, to which device responded created a new empty group added device to the group

info 2022-10-21 10:19:31: MQTT publish: topic 'zigbee2mqtt/sens_smoke2', payload '{"ac_status":false,"battery":100,"battery_low":false,"device_temperature":33,"last_seen":1666340371963,"linkquality":255,"power_outage_count":37,"restore_reports":false,"sensitivity":"high","smoke":false,"smoke_density":0,"supervision_reports":false,"tamper":false,"test":false,"trouble":false,"voltage":3055}'

info 2022-10-21 10:19:46: MQTT publish: topic 'zigbee2mqtt/bridge/response/group/add', payload '{"data":{"friendly_name":"testgroup","id":1},"status":"ok","transaction":"kdq4w-4"}'

info 2022-10-21 10:19:52: Adding 'sens_smoke2' to 'testgroup'

info 2022-10-21 10:20:27: MQTT publish: topic 'zigbee2mqtt/bridge/response/group/members/add', payload '{"data":{"device":"0x00158d0002f37664/1","group":"testgroup"},"error":"Failed to add from group (Command 0x00158d0002f37664/1 genGroups.add({\"groupid\":1,\"groupname\":\"\"}, {\"sendWhen\":\"immediate\",\"timeout\":10000,\"disableResponse\":false,\"disableRecovery\":false,\"disableDefaultResponse\":true,\"direction\":0,\"srcEndpoint\":null,\"reservedBits\":0,\"manufacturerCode\":null,\"transactionSequenceNumber\":null,\"writeUndiv\":false}) failed (Timeout - 55803 - 1 - 49 - 4 - 0 after 10000ms))","status":"error","transaction":"kdq4w-5"}'

error 2022-10-21 10:20:27: Failed to add from group (Command 0x00158d0002f37664/1 genGroups.add({"groupid":1,"groupname":""}, {"sendWhen":"immediate","timeout":10000,"disableResponse":false,"disableRecovery":false,"disableDefaultResponse":true,"direction":0,"srcEndpoint":null,"reservedBits":0,"manufacturerCode":null,"transactionSequenceNumber":null,"writeUndiv":false}) failed (Timeout - 55803 - 1 - 49 - 4 - 0 after 10000ms))

kovaga commented 2 years ago

Furthermore, I have tried adding other devices to the group and it seems that I can instantly add the routers (7 different types of devices), but none of the endpoints (9 different types of devices). I think it is worth opening a separate ticket for this issue.

pannal commented 2 years ago

CC2652R_coordinator_20220928 on a zzh! seems even better than CC2652R_coordinator_20220219. I've had the occasional hickup with the older firmware. I'll report back if that stays the same or if it improves. For now, none of my 60+ devices answers late (or doesn't answer at all for a couple of seconds).

TheJulianJES commented 2 years ago

FYI, SimpleLink SDK 6.30 is out. Perhaps that includes some fixes (although the changelog isn't that promising).

(A memory-saving change was also made in zigpy-znp sometime back)

Changelog: https://software-dl.ti.com/simplelink/esd/simplelink_cc13xx_cc26xx_sdk/6.30.00.84/exports/changelog.html

dumpfheimer commented 2 years ago

I have been playing around quite a bit with building firmware myself and to be honest I never experienced crashes the way I initially reported here. Additionally I switched to a CC1352P7 which has considerably more RAM some time inbetween.

Will try the new SDK as soon as I have some spare time.

TheJulianJES commented 2 years ago

FYI, SimpleLink SDK 6.30 is out. Perhaps that includes some fixes (although the changelog isn't that promising). Changelog: https://software-dl.ti.com/simplelink/esd/simplelink_cc13xx_cc26xx_sdk/6.30.00.84/exports/changelog.html

cc @Koenkk if you haven't seen this

Koenkk commented 2 years ago

@TheJulianJES yes I will update the SDK soon; it switches to TI-RTOS7 so maybe we have some luck.

Koenkk commented 2 years ago

Update is available now (20221102); please let me know if it fixes the ZHA uptime problems.

https://github.com/Koenkk/Z-Stack-firmware/tree/develop/coordinator/Z-Stack_3.x.0/bin

This release the SDK switched to TI-RTOS7 which seems to have impact on the memory management (I don't have to specify a HEAPMGR_SIZE anymore). 🤞

alexruffell commented 2 years ago

@Koenkk

I can't get it to work even though I believe I am doing the same exact thing I did previously. I press the boot button and plug it in.

image

The previous files were 405kB while this latest rev is 494kB - is that normal? I was able to re-flash the same older version so it would appear to be a file issue.

Wireheadbe commented 2 years ago

Flashed via Uniflash - CC2652R_coordinator_20221102.hex - let's see how this goes 👍🏻

Koenkk commented 2 years ago

@alexruffell I also noticed the files are bigger but I guess this is due to TI-RTOS7. Can you try with UNIFLASH?

alexruffell commented 2 years ago

@Koenkk That flasher is too complicated. I guessed my way thorough and it did not work. I picked bootloader, CC2652R1F (There was no CC2562P option) and then selected the new hex file as the 1st application. I edited the COM port to COM6 as there is where my dongle is seen but it fails. If I search for CC2652P1F I find a programming option that requires some sort of hardware flasher.

dkwireless commented 2 years ago

Same here. It won't automatically detect device in uniflasher.

Koenkk commented 2 years ago

I flashed it with cc2538-bsl

MattL0 commented 2 years ago

Same works perfect with cc2538-bsl

alexruffell commented 2 years ago

Thanks to this video https://www.youtube.com/watch?v=iCE5Z43EKpk I was able to flash the firmware using cc2538-bsl on my Sonoff Dongle. All seems working fine.

dumpfheimer commented 2 years ago

@Koenkk Thanks for the update!

I flashed it on my P2 to test if it hangs at some point. Did you by any chance have an issue with the serial connection when upgrading to 6.30? My P7 device does not respond to anything after an update.

dkwireless commented 2 years ago

Just to summarize it for people running windows:

  1. Install Python - https://www.python.org/ftp/python/3.11.0/python-3.11.0-amd64.exe
  2. Put firmware and script https://github.com/JelmerT/cc2538-bsl script unpacked into one folder
  3. Open PowerShell as Administrator
  4. Install prerequisites; run: pip install pyserial intelhex
  5. Plug in your stick and look in device manager what COM port it is using
  6. Flash using command: python cc2538-bsl.py -p COM5 -evw --bootloader-sonoff-usb .\CC1352P2_CC2652P_launchpad_coordinator_20221102.hex (change COM port in command to reflect your stick)
digiblur commented 2 years ago

@Koenkk That flasher is too complicated. I guessed my way thorough and it did not work. I picked bootloader, CC2652R1F (There was no CC2562P option) and then selected the new hex file as the 1st application. I edited the COM port to COM6 as there is where my dongle is seen but it fails. If I search for CC2652P1F I find a programming option that requires some sort of hardware flasher.

I went through both methods here https://youtu.be/4S_c_m6z-RY

The python one is so nice though!

jerrm commented 2 years ago

The python one is so nice though!

Yeah, cc2538-bsl is the best option by far for anyone without CLI-phobia.

I just run it from my HA container, the pre-reqs are already installed there.

Koenkk commented 2 years ago

@dumpfheimer

Did you by any chance have an issue with the serial connection when upgrading to 6.30?

Didn't had problems with it.

Wireheadbe commented 2 years ago

Flashed via Uniflash - CC2652R_coordinator_20221102.hex - let's see how this goes 👍🏻

Seems to be quite responsive, somewhat better than the previous version.

cpuks commented 2 years ago

I'm using zigstar multitool -> https://zig-star.com/radio-docs/zigstar-multi-tool/ Works with any zigbee device not only zigstar products

BastiaanNaber commented 2 years ago

@Koenkk I get an error on the CC2652R_coordinator_20221102.hex firmware with uniflash 8.1.0

Error on line 11206 : Record has unexpected length

The previous firmware works fine. Any ideas what this could be?

Resolved by downgrading to uniflash 8.0.0

TheJulianJES commented 2 years ago

I'll have to check logs later, but it seems like my Zigbee stick crashed completely after a couple of days (ZHA).