Open CrazyCoder opened 1 month ago
Likely related to #23704.
In case you are running z2m in a Docker container, does the following fix it?
docker exec -it CONTAINER_NAME sh
apk add nano
nano /app/node_modules/zigbee-herdsman-converters/lib/ota/common.js
# Change 'const IMAGE_BLOCK_RESPONSE_DELAY = 250;' to 'const IMAGE_BLOCK_RESPONSE_DELAY = 1000;'
# save and exit nano
exit
docker restart CONTAINER_NAME
(CC: @LaurentvdBos)
@Koenkk I made this change, but it didn't help:
[2024-09-05 07:23:26] debug: zh:ember:uart:ash: <--- [FRAME type=DATA]
[2024-09-05 07:23:26] debug: zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=2](ackRx=2 frmTx=3)
[2024-09-05 07:23:26] debug: zh:ember:uart:ash: Timer expired waiting for ACK for frmTx=3, ackRx=2
[2024-09-05 07:23:26] debug: zh:ember:uart:ash: ---> [FRAME type=ACK frmRx=7](ackRx=2)
[2024-09-05 07:23:26] debug: zh:ember:uart:ash: ---> [FRAME type=DATA_RETX frmReTx=2 frmRx=7](ackRx=2 frmTx=3)
[2024-09-05 07:23:26] debug: zh:ember:uart:ash: <--- [FRAME type=DATA]
[2024-09-05 07:23:26] debug: zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=2](ackRx=2 frmTx=3)
[2024-09-05 07:23:26] debug: zh:ember:uart:ash: ---> [FRAME type=ACK frmRx=7](ackRx=2)
[2024-09-05 07:23:26] debug: zh:ember:uart:ash: <--- [FRAME type=DATA]
[2024-09-05 07:23:26] debug: zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=2](ackRx=2 frmTx=3)
[2024-09-05 07:23:26] debug: zh:ember:uart:ash: ---> [FRAME type=ACK frmRx=7](ackRx=2)
[2024-09-05 07:23:26] debug: zh:ember:uart:ash: <--- [FRAME type=DATA]
[2024-09-05 07:23:26] debug: zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=2](ackRx=2 frmTx=3)
[2024-09-05 07:23:26] debug: zh:ember:uart:ash: ---> [FRAME type=ACK frmRx=7](ackRx=2)
[2024-09-05 07:23:26] debug: zh:ember:uart:ash: <--- [FRAME type=DATA]
[2024-09-05 07:23:26] debug: zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=2](ackRx=2 frmTx=3)
[2024-09-05 07:23:26] debug: zh:ember:uart:ash: ---> [FRAME type=ACK frmRx=7](ackRx=2)
[2024-09-05 07:23:28] debug: zh:ember:uart:ash: <--- [FRAME type=DATA]
[2024-09-05 07:23:28] debug: zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=2](ackRx=2 frmTx=3)
[2024-09-05 07:23:28] debug: zh:ember:uart:ash: ---> [FRAME type=ACK frmRx=7](ackRx=2)
[2024-09-05 07:23:28] debug: zh:ember:uart:ash: <--- [FRAME type=DATA]
[2024-09-05 07:23:28] debug: zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=2](ackRx=2 frmTx=3)
[2024-09-05 07:23:28] debug: zh:ember:uart:ash: ---> [FRAME type=ACK frmRx=7](ackRx=2)
[2024-09-05 07:23:28] debug: zh:ember:uart:ash: <--- [FRAME type=DATA]
[2024-09-05 07:23:28] debug: zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=2](ackRx=2 frmTx=3)
[2024-09-05 07:23:28] debug: zh:ember:uart:ash: Timer expired waiting for ACK for frmTx=3, ackRx=2
[2024-09-05 07:23:28] debug: zh:ember:uart:ash: ---> [FRAME type=ACK frmRx=7](ackRx=2)
[2024-09-05 07:23:28] debug: zh:ember:uart:ash: ---> [FRAME type=DATA_RETX frmReTx=2 frmRx=7](ackRx=2 frmTx=3)
[2024-09-05 07:23:28] debug: zh:ember:uart:ash: <--- [FRAME type=DATA]
[2024-09-05 07:23:28] debug: zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=2](ackRx=2 frmTx=3)
[2024-09-05 07:23:28] debug: zh:ember:uart:ash: ---> [FRAME type=ACK frmRx=7](ackRx=2)
[2024-09-05 07:23:28] debug: zh:ember:uart:ash: <--- [FRAME type=DATA]
[2024-09-05 07:23:28] debug: zh:ember:uart:ash: <--- [FRAME type=DATA ackNum=2](ackRx=2 frmTx=3)
[2024-09-05 07:23:28] debug: zh:ember:uart:ash: ---> [FRAME type=ACK frmRx=7](ackRx=2)
[2024-09-05 07:23:31] debug: zh:ember:uart:ash: <--- [FRAME type=ERROR]
[2024-09-05 07:23:31] error: zh:ember:uart:ash: Received ERROR from adapter, with code=ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT.
[2024-09-05 07:23:31] error: zh:ember:uart:ash: ASH disconnected | Adapter status: ASH_NCP_FATAL_ERROR
[2024-09-05 07:23:31] error: zh:ember:uart:ash: Error while parsing received frame, status=ASH_NCP_FATAL_ERROR.
[2024-09-05 07:23:31] error: zh:ember: Adapter fatal error: HOST_FATAL_ERROR
[2024-09-05 07:23:31] debug: zh:controller: Adapter disconnected
[2024-09-05 07:23:31] info: zh:ember:uart:ash: ASH COUNTERS since last clear:
[2024-09-05 07:23:31] info: zh:ember:uart:ash: Total frames: RX=2715, TX=3553
[2024-09-05 07:23:31] info: zh:ember:uart:ash: Cancelled : RX=0, TX=0
[2024-09-05 07:23:31] info: zh:ember:uart:ash: DATA frames : RX=2695, TX=891
[2024-09-05 07:23:31] info: zh:ember:uart:ash: DATA bytes : RX=61501, TX=45253
[2024-09-05 07:23:31] info: zh:ember:uart:ash: Retry frames: RX=15, TX=3
[2024-09-05 07:23:31] info: zh:ember:uart:ash: ACK frames : RX=3, TX=2658
[2024-09-05 07:23:31] info: zh:ember:uart:ash: NAK frames : RX=0, TX=0
[2024-09-05 07:23:31] info: zh:ember:uart:ash: nRdy frames : RX=0, TX=0
[2024-09-05 07:23:31] info: zh:ember:uart:ash: CRC errors : RX=0
[2024-09-05 07:23:31] info: zh:ember:uart:ash: Comm errors : RX=0
[2024-09-05 07:23:31] info: zh:ember:uart:ash: Length < minimum: RX=0
[2024-09-05 07:23:31] info: zh:ember:uart:ash: Length > maximum: RX=0
[2024-09-05 07:23:31] info: zh:ember:uart:ash: Bad controls : RX=0
[2024-09-05 07:23:31] info: zh:ember:uart:ash: Bad lengths : RX=0
[2024-09-05 07:23:31] info: zh:ember:uart:ash: Bad ACK numbers : RX=0
[2024-09-05 07:23:31] info: zh:ember:uart:ash: Out of buffers : RX=0
[2024-09-05 07:23:31] info: zh:ember:uart:ash: Retry dupes : RX=15
[2024-09-05 07:23:31] info: zh:ember:uart:ash: Out of sequence : RX=0
[2024-09-05 07:23:31] info: zh:ember:uart:ash: ACK timeouts : RX=3
[2024-09-05 07:23:31] info: zh:ember:uart:ash: Port closed.
[2024-09-05 07:23:31] info: zh:ember:uart:ash: ======== ASH stopped ========
[2024-09-05 07:23:31] info: zh:ember:ezsp: ======== EZSP stopped ========
[2024-09-05 07:23:31] info: zh:ember: ======== Ember Adapter Stopped ========
@Nerivec do you think the adapter crashes because too many commands are sent to it?
If IMAGE_BLOCK_RESPONSE_DELAY
really was set to 1s, it should be more than fine. I've tested the Dongle-E to almost 100 commands in 1.5sec without crash.
Seems like wires are getting crossed as to which side is supposed to ACK.
@CrazyCoder How spammy is your network outside of OTA? Any chance you can try with one of the firmware described in this post?
@Nerivec The presence sensor is updating every 1 second, according to the logs. Other than that, all the other devices (less than 60 total) do not send updates that often. I don't know what to consider "spammy".
It's been working like that for 2+ years with no issues.
Any chance you can try with one of the firmware described in https://github.com/Koenkk/zigbee2mqtt/discussions/23224#discussioncomment-10422347?
I'll give it a shot.
@Nerivec I flashed nabucasa_yellow_ncp-uart-hw_115200_max_8.0.1.0.gbl
and it appears to help. Updated two bulbs with no issues/restarts.
Should I stick to 8.x firmware for now?
Can you let it run for a few hours, then try to update a device again (if you have another that you can)?
The max firmware "working" would point to the adapter not being able to get rid of messages (from z2m or from network) fast enough with previous firmware (max has larger limits on that). The presence sensor likely doesn't help, but it shouldn't be a "deal breaker", as long as the rest of the network is more tamed. Either that or the flashing/hard-restart did the adapter some good.
A few options:
adapter_concurrent
setting (default for ember
-when not specified- is 16)@Nerivec Thanks! Since previously I could cause the adapter to restart right away, I don't think running for a few hours would change anything. I updated a couple more bulbs, and there are three more remaining. Will try updating them later just in case.
Same problem here with SkyConnect. Crashing 2-4 times before one firmware update is complete. And I got 18 devices to update. Will take several days...
With 8.0.1.0
controller firmware I was able to update all the bulbs without crashing. Even updating two at the same time worked.
@jvuotila There's a "max" firmware for skyconnect in the link above too, if you want to give that one a try.
For reference, could you guys list the following for your respective networks:
CC: @darkxst
For reference, could you guys list the following for your respective networks:
- Number of end devices
- Number of routers
End devices: 93 Routers: 49
@Nerivec 56 devices total, 4 standalone routers (Custom devices (DiY) ti.router) + 18 other routers.
@jvuotila There's a "max" firmware for skyconnect in the link above too, if you want to give that one a try.
@Nerivec I have Silicon Labs firmware 7.4.3.0 (ncp-uart-hw-v7.4.3.0-skyconnect-115200.gbl). Seems to be totally different firmware than behind that link. Is this wrong version?
@jvuotila This one (extract archive to get the GBL file) is roughly the equivalent of the NabuCasa but with version 8.0.1 and higher configs. 8.0.1 is still "in testing", regular builders have not made that version available yet. You can revert back to 7.4.3 after you are done testing/updating devices if you prefer. You can flash it with https://github.com/Nerivec/ember-zli or https://github.com/NabuCasa/universal-silabs-flasher
@Nerivec Thanks, I flashed my coordinator. But I already managed to run all the updates thru earlier today. But now I got different firmware and let's see how it goes with it later.
Flashed my USB Skyconnect to 8.0.1 max, and disconnected the presence detector.
No change in behavior.
@Nerivec now I got new OTA updates and everything seems to work fine. So that firmware seems to be fine for me.
What happened?
My issue is similar to #22873.
I saw OTA updates for Hue bulbs and tried to update just one. Every time I try it, after 10–30 seconds, the adapter restarts.
The same issue occurs with 1.40 Z2M release and 1.40.0-dev commit: 6dade8b.
I'm using HA Yellow with
ncp-uart-hw-v7.4.3.0-yellow-115200.gbl
adapter firmware.Z2M configuration:
What did you expect to happen?
OTA update to work.
How to reproduce it (minimal and precise)
Try updating Hue bulb.
Zigbee2MQTT version
1.40
Adapter firmware version
7.4.3 [GA]
Adapter
ember
Setup
HA Yellow
Debug log
Full log starting from the moment of the OTA update: update.log.
The tail of the log: