apache / mynewt-mcumgr

Apache mynewt
https://mynewt.apache.org/
98 stars 76 forks source link

SMP notifications corrupted on apollo3 #161

Open simonratner opened 2 years ago

simonratner commented 2 years ago

SMP notifications used to deliver SMP responses over BLE transport are corrupted on the apollo3_evb.

Below is a sample log from nRF Connect app attempting an image list command (as part of the DFU process). You can see the response being fragmented over two notifications, with the second notification corrupted. As a result, the concatenated CBOR is not decodable. The response does not need to be corrupted - the issue exists even for single-packet responses.

This is reproducible with stock bleprph on apollo3_evb. This is NOT reproducible with the same app on nordic_pca10056.

Error can be observed with the nRF Connect app or the nRF Device Manager app for Android/iOS (CBOR Error).

V   17:20:50.573    Connecting to 66:77:88:23:BB:EF...
D   17:20:50.573    gatt = device.connectGatt(autoConnect = false, TRANSPORT_LE, preferred PHY = LE 1M)
D   17:20:56.145    [Callback] Connection state changed with status: 133 and new state: DISCONNECTED (0)
E   17:20:56.150    Error 133 (0x85): GATT ERROR
I   17:20:56.150    Disconnected
D   17:20:58.707    gatt.close()
D   17:20:58.709    wait(200)
V   17:20:58.910    Connecting to 66:77:88:23:BB:EF...
D   17:20:58.910    gatt = device.connectGatt(autoConnect = false, TRANSPORT_LE, preferred PHY = LE 1M)
D   17:21:02.715    [Callback] Connection state changed with status: 0 and new state: CONNECTED (2)
I   17:21:02.715    Connected to 66:77:88:23:BB:EF
V   17:21:02.715    Requesting new MTU...
D   17:21:02.715    gatt.requestMtu(517)
I   17:21:03.159    Connection parameters updated (interval: 7.5ms, latency: 0, timeout: 5000ms)
I   17:21:03.391    MTU changed to: 256
V   17:21:03.398    Discovering services...
D   17:21:03.398    gatt.discoverServices()
D   17:21:03.404    [Callback] Services discovered with status: 0
I   17:21:03.404    Services discovered
V   17:21:03.406    Generic Access (0x1800)
- Device Name [R] (0x2A00)
- Appearance [R] (0x2A01)
Device Information (0x180A)
- Model Number String [R] (0x2A24)
- Serial Number String [R] (0x2A25)
- Firmware Revision String [R] (0x2A26)
- Manufacturer Name String [R] (0x2A29)
SMP Service (8d53dc1d-1db7-4cd3-868b-8a527460aa84)
- SMP Characteristic [N WNR] (da2e7828-fbce-4e01-ae9e-261174997c48)
   Client Characteristic Configuration (0x2902)
I   17:21:03.474    Connection parameters updated (interval: 45.0ms, latency: 0, timeout: 5000ms)
V   17:21:08.902    [McuMgr] Connecting...
D   17:21:08.905    [McuMgr] gatt = device.connectGatt(autoConnect = false, TRANSPORT_LE, LE 1M)
D   17:21:08.953    [McuMgr] [Callback] Connection state changed with status: 0 and new state: 2 (CONNECTED)
I   17:21:08.958    [McuMgr] Connected to 66:77:88:23:BB:EF
D   17:21:08.962    [McuMgr] wait(300)
I   17:21:08.966    [McuMgr] MTU changed to: 256
V   17:21:09.266    [McuMgr] Discovering services...
D   17:21:09.271    [McuMgr] gatt.discoverServices()
I   17:21:09.276    [McuMgr] Services discovered
V   17:21:09.286    [McuMgr] Primary service found
V   17:21:09.290    [McuMgr] Requesting new MTU...
D   17:21:09.292    [McuMgr] gatt.requestMtu(498)
I   17:21:09.370    [McuMgr] MTU changed to: 256
D   17:21:09.374    [McuMgr] gatt.setCharacteristicNotification(da2e7828-fbce-4e01-ae9e-261174997c48, true)
V   17:21:09.378    [McuMgr] Enabling notifications for da2e7828-fbce-4e01-ae9e-261174997c48
D   17:21:09.381    [McuMgr] gatt.writeDescriptor(00002902-0000-1000-8000-00805f9b34fb, value=0x01-00)
I   17:21:09.459    [McuMgr] Data written to descr. 00002902-0000-1000-8000-00805f9b34fb, value: (0x) 01-00
I   17:21:09.464    [McuMgr] Notifications enabled
A   17:21:09.470    [McuMgr] Sending (10 bytes) Header (Op: 0, Flags: 0, Len: 2, Group: 1, Seq: 0, Command: 0) CBOR {}
V   17:21:09.480    [McuMgr] Writing characteristic da2e7828-fbce-4e01-ae9e-261174997c48 (WRITE COMMAND)
D   17:21:09.484    [McuMgr] gatt.writeCharacteristic(da2e7828-fbce-4e01-ae9e-261174997c48)
I   17:21:09.489    [McuMgr] Data written to da2e7828-fbce-4e01-ae9e-261174997c48, value: (0x) 00-00-00-02-00-01-00-00-BF-FF
I   17:21:09.648    [McuMgr] Notification received from da2e7828-fbce-4e01-ae9e-261174997c48, value: (0x) 01-00-00-FC-00-01-00-00-BF-66-69-6D-61-67-65-73-9F-BF-64-73-6C-6F-74-00-67-76-65-72-73-69-6F-6E-69-31-2E-30-2E-30-2E-34-37-39-64-68-61-73-68-58-20-B2-27-75-77-F3-C1-C3-BF-E5-AA-26-64-7C-DF-77-AD-26-27-C8-EC-58-E3-C5-DC-8D-3B-97-B6-51-9E-73-BC-68-62-6F-6F-74-61-62-6C-65-F5-67-70-65-6E-64-69-6E-67-F4-69-63-6F-6E-66-69-72-6D-65-64-F5-66-61-63-74-69-76-65-F5-69-70-65-72-6D-61-6E-65-6E-74-F4-FF-BF-64-73-6C-6F-74-01-67-76-65-72-73-69-6F-6E-69-31-2E-30-2E-30-2E-34-37-39-64-68-61-73-68-58-20-9F-46-47-02-D7-E9-BA-FD-57-69-BF-73-42-88-86-85-A0-1B-66-C1-B7-84-48-0A-E3-E8-F8-B5-12-95-80-F7-68-62-6F-6F-74-61-62-6C-65-F5-67-70-65-6E-64-69-6E-67-F4-69-63-6F-6E-66-69-72-6D-65-64-F4-66-61-63-74-69-76-65-F4-69-70-65-72-6D-61-6E-65-6E-74-F4-FF-FF-6B-73-70-6C-69-74-53
I   17:21:10.226    [McuMgr] Notification received from da2e7828-fbce-4e01-ae9e-261174997c48, value: (0x) 74-53-00-FC-00-01-00
V   17:21:10.246    [McuMgr] Disconnecting...
D   17:21:10.253    [McuMgr] gatt.disconnect()
D   17:21:10.289    [McuMgr] [Callback] Connection state changed with status: 0 and new state: 0 (DISCONNECTED)
I   17:21:10.294    [McuMgr] Disconnected
D   17:21:10.299    [McuMgr] gatt.close()

The expected second notification (at t=17:21:10.226) should be:

74-61-74-75-73-00-FF
simonratner commented 2 years ago

cc @t3zeng

t3zeng commented 2 years ago

Hey guys, you can reproduce this by running the bleprph app and then try to perform an image read.

Repro steps:

  1. Load bleprph app onto apollo3 evb and boot up
  2. On your phone, go to nRF Connect Device Manager and look for the evb
  3. Go to the Image tab at the bottom and then go to the top right and press Advanced
  4. A new Images section will pop up. Press read
  5. At this point there will be BLE comms with the apollo3 evb but the image data will fail to be retrieved

What I observe is that in /apache-mynewt-core/mgmt/smp/transport/ble/src/smp_ble.c, if you print out the contents of om in smp_ble_out that gets put onto the mqueue, it will not match the contents when you get the mbuff back in smp_ble_event_data_in even though the address of the mbuf is the same.

I set a watchpoint on the address of the corrupted mbuf and was able to get the following backtrace:

#0  smp_tx_rsp (ns=<optimized out>, rsp=0x100025a4 <os_msys_1_data+112>, arg=0x10002f80 <g_smp_ble_transport>) at repos/apache-mynewt-core/mgmt/smp/src/smp.c:226
#1  0x0001c418 in smp_process_request_packet (streamer=streamer@entry=0x10002f80 <g_smp_ble_transport>, req=0x10004408 <pool_acl_buf>)
    at repos/apache-mynewt-mcumgr/smp/src/smp.c:365
#2  0x00016e50 in smp_process_packet (st=0x10002f80 <g_smp_ble_transport>) at repos/apache-mynewt-core/mgmt/smp/src/smp.c:265
#3  0x00016e78 in smp_event_data_in (ev=<optimized out>) at repos/apache-mynewt-core/mgmt/smp/src/smp.c:293
#4  0x00015526 in os_eventq_run (evq=<optimized out>) at repos/apache-mynewt-core/kernel/os/src/os_eventq.c:196
#5  0x00014aea in main () at apps/bleprph/src/main.c:356

What seems to happen is that in this line of code the mbuf returned is the same address as the one used to store the notify data so the mbuf gets corrupted. This behavior does not seem to be present on the nrf52840dk pca10056

t3zeng commented 2 years ago

Adding to the above, I set up some checks and increased the size of the task stack to 8192 words OS_CTX_SW_STACK_CHECK: 1 OS_MEMPOOL_CHECK: 1 OS_MEMPOOL_GUARD: 1 OS_MEMPOOL_POISON: 1 The issue persists despite the stack usage is very far off from maxing out.