Open simonratner opened 2 years ago
cc @t3zeng
Hey guys, you can reproduce this by running the bleprph app and then try to perform an image read.
Repro steps:
Image
tab at the bottom and then go to the top right and press Advanced
Images
section will pop up. Press readWhat I observe is that in /apache-mynewt-core/mgmt/smp/transport/ble/src/smp_ble.c
, if you print out the contents of om
in smp_ble_out
that gets put onto the mqueue, it will not match the contents when you get the mbuff back in smp_ble_event_data_in
even though the address of the mbuf is the same.
I set a watchpoint on the address of the corrupted mbuf and was able to get the following backtrace:
#0 smp_tx_rsp (ns=<optimized out>, rsp=0x100025a4 <os_msys_1_data+112>, arg=0x10002f80 <g_smp_ble_transport>) at repos/apache-mynewt-core/mgmt/smp/src/smp.c:226
#1 0x0001c418 in smp_process_request_packet (streamer=streamer@entry=0x10002f80 <g_smp_ble_transport>, req=0x10004408 <pool_acl_buf>)
at repos/apache-mynewt-mcumgr/smp/src/smp.c:365
#2 0x00016e50 in smp_process_packet (st=0x10002f80 <g_smp_ble_transport>) at repos/apache-mynewt-core/mgmt/smp/src/smp.c:265
#3 0x00016e78 in smp_event_data_in (ev=<optimized out>) at repos/apache-mynewt-core/mgmt/smp/src/smp.c:293
#4 0x00015526 in os_eventq_run (evq=<optimized out>) at repos/apache-mynewt-core/kernel/os/src/os_eventq.c:196
#5 0x00014aea in main () at apps/bleprph/src/main.c:356
What seems to happen is that in this line of code the mbuf returned is the same address as the one used to store the notify data so the mbuf gets corrupted. This behavior does not seem to be present on the nrf52840dk pca10056
Adding to the above, I set up some checks and increased the size of the task stack to 8192 words OS_CTX_SW_STACK_CHECK: 1 OS_MEMPOOL_CHECK: 1 OS_MEMPOOL_GUARD: 1 OS_MEMPOOL_POISON: 1 The issue persists despite the stack usage is very far off from maxing out.
SMP notifications used to deliver SMP responses over BLE transport are corrupted on the apollo3_evb.
Below is a sample log from nRF Connect app attempting an
image list
command (as part of the DFU process). You can see the response being fragmented over two notifications, with the second notification corrupted. As a result, the concatenated CBOR is not decodable. The response does not need to be corrupted - the issue exists even for single-packet responses.This is reproducible with stock bleprph on apollo3_evb. This is NOT reproducible with the same app on nordic_pca10056.
Error can be observed with the nRF Connect app or the nRF Device Manager app for Android/iOS (CBOR Error).
The expected second notification (at t=17:21:10.226) should be: