Closed Fabbi closed 1 year ago
Checking. Give me a minute. But thanks for the submission!
@Fabbi I don't think this is a proper fix - the culprit, following your PR, has to be the guard statement in McuMgrBleTransportWriteState's received(), which is the one triggering the 'open'. So, this is the function not properly detecting that the chunked response has been received.
Furthermore, the McuMgrResponse.getExpectedLength(scheme: .ble, responseData: data) call is returning a bad result? Meaning we're expecting more or less bytes and that's why the validation state does not continue?
In my case the McuMgrResponse.getExpedtedLength
did work as intended but the following bytes were interpreted as another message because data.readMcuMgrHeaderSequenceNumber()
always returns a number as long as data.count >= 8
.
But those missing bytes came in raw with no real header!
received()
can't be the Problem as it get's called from McuMgrBleTransport+CBPerip[heralDelegate:peripheral
with the wrong sequenceNumber
I agree, that McuMgrResponse.getExpectedLength
seems to miss some implementation and thus can trigger bad behaviour there, but that's the case before and after the PR. This PR only fixes the sendTimeout when messages are split.
I was able to upgrade my firmware with this code active.
Let me think about it a bit better.
By the way - would you be able to provide me with a firmware file to flash so that I may test and see this happening myself?
@Fabbi do you see something like "OOD Packet: Received Seq No. 4 instead of expected Seq No. 2" in the logs when sendTimeout happens?
Let me think about it a bit better.
By the way - would you be able to provide me with a firmware file to flash so that I may test and see this happening myself?
I'm afraid not, as it's a proprietary bin of a client.
@Fabbi do you see something like "OOD Packet: Received Seq No. 4 instead of expected Seq No. 2" in the logs when sendTimeout happens?
No, the timeout doesn't happen on upload but on receiving data from the ImageManager.list
command.
That's the log I get on the current version. I stepped through the code and ended up in the methods I changed in this PR.
dfu -︎ Upgrade started with 1 images using 'Test And Confirm' mode
dfu -︎ Requesting device capabilities...
default -︎ Sending read command (Group: default, seq: 0, ID: McuMgrParameters): nil
transport -︎ Connecting...
transport -︎ Peripheral connected
transport -︎ Discovering services...
transport -︎ Services discovered: 8D53DC1D-1DB7-4CD3-868B-8A527460AA84
transport -︎ Discovering characteristics...
transport -︎ Characteristics discovered: DA2E7828-FBCE-4E01-AE9E-261174997C48
transport -︎ Enabling notifications...
transport -︎ Notifications enabled
transport -︎ Device ready
transport -︎ -> 0x0000000100000006a0
transport -︎ <- 0x0100000600000006bf62726308ff
default -︎ Response (Group: default, seq: 0, ID: 6): {"rc" : 8}
dfu -︎ Device capabilities not supported
dfu -︎ Sending Image List command...
---------------------------------------------------------------------
image -︎ Sending read command (Group: image, seq: 0, ID: State): nil
---------------------------------------------------------------------
transport -︎ -> 0x0000000100010000a0
transport -︎ Sending the request timed out.
image -︎ Request (Group: image, seq: 0) failed: Sending the request timed out.)
dfu -︎ Sending the request timed out.
the Sending read command (Group: image, seq: 0, ID: State): nil
(highlighted above) results in McuMgrBleTransportWriteState.state[0]
with totalChunkSize: 259
.
The first time McuMgrBleTransportWriteState.received(0, ...)
is invoked with 144 bytes
. Then another 15 bytes
come in which will be interpreted as new Message with sequenceNumber=116
, so McuMgrBleTransportWriteState.received(116, <15bytes>)
is invoked which throws everything away without error because self.state[116] == nil
.
@Fabbi thanks for the feedback.
I was able to reproduce the issue, I think. So I've been working on properly understanding the cause. I have an idea that might help, but I'm not sure.
In any case, while I'm working on it - you have your own fix that works for you, so at least your customer doesn't have the issue. Let me see if I can make good progress today.
Hi @Fabbi - can you check out the master branch and see if the issue keeps happening ?
Thanks.
This fixes an sendTimeout issue caused by split responses.
When the second part of a message was >= 8 bytes (so almost always I would assume) it would be handled as new message and potentially thrown away.
Should fix #71 and #52