dresden-elektronik / deconz-ota-plugin

Server side implementation of the standard Zigbee OTA protocol.
BSD 3-Clause "New" or "Revised" License
47 stars 13 forks source link

OTAU for Trådfri repeaters retries indefinitely #37

Open jzee opened 2 years ago

jzee commented 2 years ago

I'm using the deconz add-on v6.11.1 in home assistant, which in turn uses deCONZ 2.13.4

The network uses a Conbee II to build my zigbee home network, the conbee firmware version is 0x26660700 My network heterogenously uses IKEA (bulbs, motion sensor, encoders, blinds), Aqara (magnet sensors, motion sensor, switches, vibration sensors), Hue (light, motion sensor) with a grand total of 20 nodes.

For a few days now I've observed that the OTAU won't stop updating my two Trådfri repeaters that I got with my Fyrtur blinds. Like some others I've had repeated trouble with losing connection to the blinds and I've been a close follower of https://github.com/dresden-elektronik/deconz-rest-plugin/issues/4535 but that's only peripherally related. As I'm testing this, the fyrtur blinds have been removed from my network.

The behavior I'm observing is that for both repeaters, independently from one another (even if I unplug one or the other) the OTAU is repeated in a cycle and never completes.

Here is the relevant debug log section where the first update ends and the next update restarts:

16:04:06:228 mg block req fwVersion:0x23086631, offset: 0x000302FC, maxsize: 63
16:04:06:228 otau send mg block rsp offset: 0x000302FC dataSize 28 0x680AE2FFFED41BDC
16:04:06:305 mg block reg fwVersion:0x23086631, offset: Ox00030318, maxsize: 63
16:04:06:306 otau send mg block rsp offset: 0x00030318 dataSize 28 0x680AE2FFFED41BDC
16:04:06:381 mg block req fwVersion:0x23086631, offset: 0x00030334, maxsize: 63
16:04:06:382 otau send img block sp offset: 0x00030334 dataSize 28 0x680AE2FFFED41BDC
16:04:06:442 mg block req fwVersion:0x23086631, offset: 0x00030350, maxsize: 63
16:04:06:442 otau send img block sp offset: 0x00030350 dataSize 28 0x680AE2FFFED41BDC
16:04:06:525 mg block req fwVersion:0x23086631, offset: 0x0003036C, maxsize: 63
16:04:06:526 otau send mg block rsp offset: 0x0003036C dataSize 28 0x680AE2FFFED41BDC
16:04:06:586 mg block req fwVersion:0x23086631, offset: Ox00030388, maxsize: 63
16:04:06:586 otau send img block sp offset: 0x00030388 dataSize 28 0x680AE2FFFED41BDC
16:04:06:650 mg block req fwVersion:0x23086631, offset: 0x000303A4, maxsize: 63
16:04:06:650 otau send img block sp offset: 0x000303A4 dataSize 24 0x680AE2FFFED41BDC
16:04:06:726 otau upgrade end req: status: 0x00, fwVersion:0x23086631, imgType: 0x1102
16:04:13:421 DeviceAnnce of LightNode: Ox680ae2fffed41bdc Permit Join: 0

Note that the FW image seems to be transmit in its entirety (0x303a4 + 24d = 197564byte which equals the reported image size, see below at timestamp 16:06:25:575).

Then follows some (probably) unrelated activity between updates:

16:04:14:381 ZCL attribute report 0x680AE2FFFED41BDC for cluster: 0x0000, ep: 0x01, frame control:
0x08, mfcode: 0x0000**
16:04:30:023 emit event/tick: Oxec1bbdfffeb6fe16
16:04:30:039 delay sending request 160 dt 0 ms to OxEC1BBDFFFEB6FE16, ep: 0x01 cluster: 0x0006
onAir: 1
16:04:30:040 delay sending request 162 dt O ms to OxEC1BBDFFFEB6FE16, ep: 0x01 cluster: 0x0008
onAir: 1
16:04:30:086
OXEC1BBDFFFEB6FE16 force poll (2)
16:04:30:108 ZCL attribute report OxEC1BBDFFFEB6FE16 for cluster: 0x0006, ep: 0x01, frame control:
0x38, mfcode: 0x0000
16:04:30:138 delay sending request 162 dt 0 ms to OxEC1BBDFFFEB6FE16, ep: 0x01 cluster: 0x0008
onAir: 1
16:04:30:159
OxEC1BBDFFFEB6FE16 force poll (2)
16:04:30:198
OxEC1BBDFFFEB6FE16 force poll (2)
16:04:30:610ZCL attribute report OxEC1BBDFFFEB6FE16 for cluster: 0x0008, ep: 0x01, frame control:
0x38, mfcode: 0x0000
16:04:35:610 ZCL attribute report OxEC1BBDFFFEB6FE16 for cluster: 0x0008, ep: 0x01, frame control:
0x38, mfcode: 0x0000
16:04:39:275 [INFO] - Button 1002 - TRADFRI motion sensor, broadcast to: 0x0000, endpoint: 0x01,
cluster: ONOFF (0x0006), action: On with timed off, payload: 0008070000, zclSeq: 57
16:04:39:326 emit event/tick: 0x804b50fffe489c30
16:04:39:338 delay sending request 214 dt 0 ms to 0x804B50FFFE489C30, ep: 0x01 cluster: 0x0008
onAir: 1
16:04:39:376
0x804B50FFFE489C30 force poll (2)
16:04:39:379 ZCL attribute report 0x804B50FFFE489C30 for cluster: 0x0006, ep: 0x01, frame control:
0x38, mfcode: 0x0000
16:04:39:409
0x804B50FFFE489C30 force poll (2)
16:04:39:523 ZCL attribute report 0x804B50FFFE489C30 for cluster: 0x0008, ep: 0x01, frame control:
0x38, mfcode: 0x0000
16:04:39:739 Set sensor check interval to 100 milliseconds
16:04:40:445 emit event/tick: Oxec1bbdfffeb6fe16
16:04:40:483
OxEC1BBDFFFEB6FE16 force poll (2)
16:04:40:612 ZCL attribute report OxEC1BBDFFFEB6FE16 for cluster: 0x0008, ep: 0x01, frame control:
0x38, mfcode: 0x0000
16:04:43:545 ZCL attribute report OxEC1BBDFFFEB6FE16 for cluster: 0x0006, ep: 0x01, frame control:
0x38, mfcode: 0x0000
16:04:44:561 ZCL attribute report 0x804B50FFFE489C30 for cluster: 0x0008, ep: 0x01, frame control:
0x38, mfcode: 0x0000
16:04:45:664 ZCL attribute report OxEC1BBDFFFEB6FE16 for cluster: 0x0008, ep: 0x01, frame control:
0x38, mfcode: 0x0000
16:04:47:532 ZCL attribute report 0x0017880106F4B794 for cluster: 0x0400, ep: 0x02, frame control:
0x08, mfcode: 0x0000
16:04:55:610ZCL attribute report 0x0017880106F4B794 for cluster: 0x0406, ep: 0x02, frame control:
0x08, mfcode: 0x0000
16:04:55:746 Device TTL 1860 s flags: 0x7
16:05:01:516 emit event/tick: Oxec1bbdfffeb6fe16
16:05:01:538 delay sending request 97 dt 0 ms to OxEC1BBDFFFEB6FE16, ep: 0x01 cluster: 0x0006
onAir: 1
16:05:01:539 delay sending request 99 dt 0 ms to OxEC1BBDFFFEB6FE16, ep: 0x01 cluster: 0x0008
onAir: 1
16:05:01:564
OxECIBBDFFFEB6FE16 force poll (2)
16:05:01:586ZCL attribute report OxEC1BBDFFFEB6FE16 for cluster: 0x0006, ep: 0x01, frame control:
0x38, mfcode: 0x0000
16:05:01:606
OxEC1BBDFFFEB6FE16 force poll (2)
16:05:01:641
OxEC1BBDFFFEB6FE16 force poll (2)
16:05:02:051ZCL attribute report OxEC1BBDFFFEB6FE16 for cluster: 0x0008, ep: 0x01, frame control:
0x38, mfcode: Ox0000
16:05:07:065ZCL attribute report OxEC1BBDFFFEB6FE16 for cluster: 0x0008, ep: 0x01, frame control:
0x38, mfcode: 0x0000
16:05:07:787 emit event/tick: Oxec1bbdfffeb6fe16
16:05:07:826
OxEC1BBDFFFEB6FE16 force poll (2)
16:05:10:830 ZCL attribute report OxEC1BBDFFFEB6FE16 for cluster: 0x0006, ep: 0x01, frame control:
0x38, mfcode: 0x0000
16:05:12:054ZCL attribute report OxEC1BBDFFFEB6FE16 for cluster: 0x0008, ep: 0x01, frame control:
0x38, mfcode: 0x0000
16:05:39:339 sensor 49 (TRADFRI motion sensor): disable presence
16:05:39:739 Set sensor check interval to 1000 milliseconds
16:05:45:867 emit event/tick: Oxeclbbdfffeb6fe16
16:05:45:919
OxEC1BBDFFFEB6FE16 force poll (2)
16:05:45:938 delay sending request 104 dt 0 ms to OxEC1BBDFFFEB6FE16, ep: 0x01 cluster: 0x0008
onAir: 1
16:05:45:946 ZCL attribute report OxEC1BBDFFFEB6FE16 for cluster: 0x0006, ep: 0x01, frame control:
0x38, mfcode: 0x0000
16:05:45:966
OxEC1BBDFFFEB6FE16 force poll (2)
16:05:46:022
OxEC1BBDFFFEB6FE16 force poll (2)
16:05:46:433 ZCL attribute report OxEC1BBDFFFEB6FE16 for cluster: 0x0008, ep: 0x01, frame control:
0x38, mfcode: 0x0000
16:05:51:441 ZCL attribute report OxEC1BBDFFFEB6FE16 for cluster: 0x0008, ep: 0x01, frame control:
0x38, mfcode: 0x0000
16:05:55:745 Device TTL 1801 s flags: 0x7
16:05:56:437ZCL attribute report OxEC1BBDFFFEB6FE16 for cluster: 0x0008, ep: 0x01, frame control:
0x38, mfcode: 0x0000
16:06:01:441 ZCL attribute report OxEC1BBDFFFEB6FE16 for cluster: 0x0008, ep: 0x01, frame control:
0x38, mfcode: 0x0000
16:06:06:444 ZCL attribute report OxEC1BBDFFFEB6FE16 for cluster: 0x0008, ep: 0x01, frame control:
0x38, mfcode: 0x0000
16:06:10:739 GW firmware version is up to date: 0x26660700

next update attempt

16:06:25:535 otau invalide update end request for node 0x680ae2fffed41bdc
16:06:25:536 otau query next img req: Ox680ae2fffed4lbdc mfCode: 0x117C, img type: 0x1102, sw
version: 0x23070631
16:06:25:540 Match otau version 0x23086631 image type 0x1102
16:06:25:541 OTAU:/data/otau/117C-1102-23086631.zigbee:198020bytes
16:06:25:542 OTAU: offset
456: ota header (56 bytes)
16:06:25:570 TAU: offset
512: tag 0x0000, length Ox0002D590 (185750 bytes)
16:06:25:570 OTAU: offset 186262: tag OxFFBF, length 0x00000040 (70 bytes)
16:06:25:572 OTAU: offset 186332: tag OxFFBE, length 0x00002BA2 (11176 bytes)
16:06:25:573 TAU: offset 197508: tag 0x7035, length OxA2DB5322 (512 bytes)
16:06:25:573 Found update file /data/otau/117C-1102-23086631.zigbee
16:06:25:574 Otau image path does not exist: /data/.local/share/dresden-elektronik/deCONZ/otau
16:06:25:575 OTAU:/data/otau/117C-1102-23086631.zigbee:197564bytes
16:06:25:576 OTAU: ota header (56 bytes)
16:06:25:576 TAU: tag 0x0000, length 0x0002D590 (185750 bytes)
16:06:25:591 OTAU: tag 0xFFBF, length 0x00000040 (70 bytes)
16:06:25:591 OTAU: tag OxFFBE, length 0x00002BA2 (11176 bytes)
16:06:25:592 OTAU: tag 0x7035, length OxA2DB5322 (512 bytes)
16:06:25:593 OTAU: packed -225252088 bytes

before and after the snippets above there's mostly a long list of update packets being sent, nothing that would in any way indicate an error. The update attempt finishes in reasonable time, 10:06 and there is unrestricted line of sight between the Conbee II and the repeater in question, distance approx. 3m

Both repeaters have firmware version 0x23070631, both are attempted an update to 0x23086631. The md5 of the firmware file served from /data/otau is ea365842bcff641a0a9a11296b0f25cf which matches the md5 of the respective file that IKEA offers for download

Restarting deCONZ, restarting the add-on, restarting home assistant, hard power-cycling the machine, changing USB port of deCONZ all have no effect on the situation.

I can consider that there's a problem with the repeaters themselves being unable to accept the firmware but I would find it staggering that both repeaters would be independently affected and not report any error whatsoever. Removing from and re-adding the repeaters to the network has no effect either.

The repeaters seem to otherwise work fine, minus the fact that the permanent FW update causes way too much radio traffic.

How can I stop OTAU from permanently updating the repeaters, ideally after succeeding with an update to 0x23086631?

jzee commented 2 years ago

clarification after I read issue #32 which seems to be somewhat similar: in the test scenario above there was only the first repeater powered, the other was unplugged. Switching repeaters, the behavior is identical with the second repeater, minus a different MAC address of course

Edit: Also, Source Routing is disabled.

jzee commented 2 years ago

Just updated my installation to the newly released 2.14.1 and the issue persists, I just reproduced it. A bit of further googling shows that others have encountered the same issue:

https://forum.phoscon.de/t/ikea-ota-firmware-update/795/21 https://forum.phoscon.de/t/ikea-update-not-working/1538

wovka88 commented 2 years ago

Have the same problem. Cant update the Ikea Repeater to the new firmware.

korkmazk commented 2 years ago

So it seems I'm not the only one, I have exactly the same issue. It seems that the update finishes till 100% and somehow it starts again. I have three repeaters and all are trying to from version 0x23070631 to 0x23086631. Debug log shows: image

Not sure when it happens but it starts the update after a while. Some other weird issue that might be a result of the this behavior is that the there is no link anymore with the repeater and same with the devices who are routing through the repeater. Only a powercycle of the repeater helps.

jzee commented 2 years ago

Digging through IKEAs release notes, I found a note that the upgrades we're trying to upgrade from (2.3.070) introduced "secure OTA upgrades" for both blinds and repeaters (Ctrl+F repeater).

Could this be an explanation for the failed upgrades?

thomas-svrts commented 2 years ago

I had the same for 1 tradfri repeater and my 3 fyrtur blinds. Upgrade via deconz kept on repeating. I connected them to the Ikea hub to update firmware.

janschroeter commented 2 years ago

Same problem here. Update of 0x23070631 to 0x23086631 looping indefinitely. deCONZ version v2.15.0-beta.

aihngel commented 2 years ago

Same issue for me. Ikea bulbs stuck in loop. I've disabled the restart check box and just let them cook overnight. Some of them cleared. Some of them I was able to get to work by power cycling them manually. I've had the same thing happen to my hue bulbs. The say done, but then the start upgrading again the next time they're polled. Help?

Wim-N commented 2 years ago

'Solved' it for now by deleting the OTAU firmware file. Let's hope that those files will be updated with the next version of deCONZ.

HenningGithub commented 2 years ago

I have the same issue with 4 fyrtur blinds - new firmware gets transmitted successfully afaict, device reboots, still reports running the old firmware. No matter how often I try. This thread pointing out that the issue apparently affects the repeaters as well made me remember I had a bunch of these repeaters unused in some corner, with old-ish firmware. Upgrading a repeater from 0x22005631 to 0x23086631 worked absolutely flawless and on the first shot. I presume the problem isn't upgrading to 2.3.08, it is upgrading from 2.3.07 - and as pointed out before, the release notes for 2.3.07 at least for the fyrturs said "more secure OTA", which may be related. I have 3 more unupgraded repeaters fwiw. I also have a bunch of other ikea devices, the only OTA-problem ones I have are the fyrturs.

ebaauw commented 2 years ago

device reboots, still reports running the old firmware

Make sure to read manually SW Build ID in the GUI, as this might not be updated automatically after upgrade/reboot. Double-check the checksum on the firmware file.

Try downloading the firmware files again (delete them from ~/otau, re-run ikea-ota-download.py, and restart deCONZ. I don't think IKEA provides a checksum on the files to check for download errors and devices would reject corrupted firmware image.

I also have a bunch of other ikea devices, the only OTA-problem ones I have are the fyrturs.

I don't think I've seen any issues with my Fyrtur or Trådfri repeaters.

'Solved' it for now by deleting the OTAU firmware file. Let's hope that those files will be updated with the next version of deCONZ.

No they won't, as the firmware files aren't included in the deCONZ distribution. Only the script to download them from the IKEA server.

I've had the same thing happen to my hue bulbs. The say done, but then the start upgrading again the next time they're polled.

I've seen the same, but second time to a different firmware version. Some of the Hue lights seem to have changed Image Type, requiring two firmware upgrades in a row: the first to change the Image Type (in my case from 0x0117 to 0x011F), the second to bring the new Image Type to the latest version. I've even seen the Hue bridge upgrading the firmware in up to four steps, with a different intermediate version each time.

HenningGithub commented 2 years ago

Make sure to read manually SW Build ID in the GUI, as this might not be updated automatically after upgrade/reboot. Double-check the checksum on the firmware file.

Done all that - this is persisting for months now, over battery changes in the blinds and reboots and power cycling the home assistant/deconz host.

Try downloading the firmware files again

also long tried

I don't think I've seen any issues with my Fyrtur or Trådfri repeaters.

You managed to get a Fyrtur to OTA-Upgrade from 2.3.07 to 2.3.08?

It's my 4 fyrturs that won't upgrade, while everything else - must be 30-40 ikea, 80 total (not all have fw upgrades tho) - does just fine.

ebaauw commented 2 years ago

You managed to get a Fyrtur to OTA-Upgrade from 2.3.07 to 2.3.08?

Dunno, sorry. Mine is on 2.3.088, but I don't know what version I upgraded it from.

HenningGithub commented 2 years ago

oh one to short, sorry, I have 2.3.073 and the update is the 2.3.088 you mention

Mask3007 commented 1 year ago

Since latest update my ikea tradfri bulbs wanted to update with recent deconz version. Deconz wanted to use file:

117C-2201-23087631.zigbee

I have no idea where this comes from. But the bulbs stuck in update loop. Its says done... restarts and after a few moments it tries again. There was some log line "invalid update end request for node". I also deleted and redownloaded all affected files by ikea-ota-download.py restartet deconz. No effect stuck on update loop. Very annoying.

manup commented 1 year ago

Did some more digging to find out why new Ledvance OTA files don't load. The problem was that the contents of the files where preprocessed and converted back – which is only needed for the editor. With PR https://github.com/dresden-elektronik/deconz-ota-plugin/pull/42 I could finally update my Ikea repeater.

niklerus commented 1 year ago

Have same issue but with all my Trådfri bulbs.

wovka88 commented 1 year ago

Since deConZ 2.19.0 the OTAU Update for all Ikea devices working again.

EarlOfBurton commented 1 year ago

Not for me, IKEA TRADFRI bulb E27 CWS opal 600lm update from v 2.3.086 to 2.3.093 hangs during update process with deCONZ 2.19,1 as Home Assistant deCONZ 6.17. Before I had the "loop update" problem with this OTA update. I need to hit query ~ 10 times to keep the update process alive before it completely gives up/breaks. Additionally Xiaomi lumi.plug.mmeu01 power readings are wrong (all read 0W which is not true) and I needed to re-pair every outlet to get the sensor readings again with deCONZ 2.19.1. So I will need to downgrade everything and tell myself once more "never update". And then disable OTA updates. I thought I could fix a known problem with this update but it is not true this time.