NordicSemiconductor / IOS-nRF-Connect-Device-Manager

A mobile management library for devices supporting nRF Connect Device Manager.
https://www.nordicsemi.com/Software-and-tools/Software/nRF-Connect-SDK
Apache License 2.0
90 stars 41 forks source link

fix read timeout for split message responses #75

Closed Fabbi closed 1 year ago

Fabbi commented 1 year ago

This fixes an sendTimeout issue caused by split responses.

When the second part of a message was >= 8 bytes (so almost always I would assume) it would be handled as new message and potentially thrown away.

Should fix #71 and #52

CLAassistant commented 1 year ago

CLA assistant check
All committers have signed the CLA.

dinesharjani commented 1 year ago

Checking. Give me a minute. But thanks for the submission!

dinesharjani commented 1 year ago

@Fabbi I don't think this is a proper fix - the culprit, following your PR, has to be the guard statement in McuMgrBleTransportWriteState's received(), which is the one triggering the 'open'. So, this is the function not properly detecting that the chunked response has been received.

Furthermore, the McuMgrResponse.getExpectedLength(scheme: .ble, responseData: data) call is returning a bad result? Meaning we're expecting more or less bytes and that's why the validation state does not continue?

Fabbi commented 1 year ago

In my case the McuMgrResponse.getExpedtedLength did work as intended but the following bytes were interpreted as another message because data.readMcuMgrHeaderSequenceNumber() always returns a number as long as data.count >= 8. But those missing bytes came in raw with no real header!

received() can't be the Problem as it get's called from McuMgrBleTransport+CBPerip[heralDelegate:peripheral with the wrong sequenceNumber

I agree, that McuMgrResponse.getExpectedLength seems to miss some implementation and thus can trigger bad behaviour there, but that's the case before and after the PR. This PR only fixes the sendTimeout when messages are split.

I was able to upgrade my firmware with this code active.

dinesharjani commented 1 year ago

Let me think about it a bit better.

By the way - would you be able to provide me with a firmware file to flash so that I may test and see this happening myself?

dinesharjani commented 1 year ago

@Fabbi do you see something like "OOD Packet: Received Seq No. 4 instead of expected Seq No. 2" in the logs when sendTimeout happens?

Fabbi commented 1 year ago

Let me think about it a bit better.

By the way - would you be able to provide me with a firmware file to flash so that I may test and see this happening myself?

I'm afraid not, as it's a proprietary bin of a client.

@Fabbi do you see something like "OOD Packet: Received Seq No. 4 instead of expected Seq No. 2" in the logs when sendTimeout happens?

No, the timeout doesn't happen on upload but on receiving data from the ImageManager.list command.

That's the log I get on the current version. I stepped through the code and ended up in the methods I changed in this PR.

dfu  -︎  Upgrade started with 1 images using 'Test And Confirm' mode
dfu  -︎  Requesting device capabilities...
default  -︎  Sending read command (Group: default, seq: 0, ID: McuMgrParameters): nil
transport  -︎  Connecting...
transport  -︎  Peripheral connected
transport  -︎  Discovering services...
transport  -︎  Services discovered: 8D53DC1D-1DB7-4CD3-868B-8A527460AA84
transport  -︎  Discovering characteristics...
transport  -︎  Characteristics discovered: DA2E7828-FBCE-4E01-AE9E-261174997C48
transport  -︎  Enabling notifications...
transport  -︎  Notifications enabled
transport  -︎  Device ready
transport  -︎  -> 0x0000000100000006a0
transport  -︎  <- 0x0100000600000006bf62726308ff
default  -︎  Response (Group: default, seq: 0, ID: 6): {"rc" : 8}
dfu  -︎  Device capabilities not supported
dfu  -︎  Sending Image List command...
---------------------------------------------------------------------
image  -︎  Sending read command (Group: image, seq: 0, ID: State): nil
---------------------------------------------------------------------
transport  -︎  -> 0x0000000100010000a0
transport  -︎  Sending the request timed out.
image  -︎  Request (Group: image, seq: 0) failed: Sending the request timed out.)
dfu  -︎  Sending the request timed out.

the Sending read command (Group: image, seq: 0, ID: State): nil (highlighted above) results in McuMgrBleTransportWriteState.state[0] with totalChunkSize: 259. The first time McuMgrBleTransportWriteState.received(0, ...) is invoked with 144 bytes. Then another 15 bytes come in which will be interpreted as new Message with sequenceNumber=116, so McuMgrBleTransportWriteState.received(116, <15bytes>) is invoked which throws everything away without error because self.state[116] == nil.

dinesharjani commented 1 year ago

@Fabbi thanks for the feedback.

I was able to reproduce the issue, I think. So I've been working on properly understanding the cause. I have an idea that might help, but I'm not sure.

In any case, while I'm working on it - you have your own fix that works for you, so at least your customer doesn't have the issue. Let me see if I can make good progress today.

dinesharjani commented 1 year ago

Hi @Fabbi - can you check out the master branch and see if the issue keeps happening ?

Thanks.

dinesharjani commented 1 year ago

Implemented here https://github.com/NordicSemiconductor/IOS-nRF-Connect-Device-Manager/pull/84