NordicPlayground / nRF51-ble-bcast-mesh

Other
323 stars 121 forks source link

RX packets with invalid length #152

Open twvd opened 7 years ago

twvd commented 7 years ago

I've been receiving packets from rbc_mesh_event_get (RBC_MESH_EVENT_TYPE_CONFLICTING_VAL) with a valid register number, but an invalid data length (249) and an unknown BLE source address. This usually occurs with a high volume of traffic on the mesh.

To illustrate this, see the following packet (captured in vh_rx):

p_packet    0x20002A50 (g_packet_pool + 80) R6  mesh_packet_t * 
    header  <struct>    0x20002A50  ble_packet_header_t 
        type    '\0' (0x00) 0x20002A50  uint8_t 
        _rfu1   '\0' (0x00) 0x20002A50  uint8_t 
        addr_type   '.' (0x01)  0x20002A50  uint8_t 
        _rfu2   '\0' (0x00) 0x20002A50  uint8_t 
        length  '\a' (0x07) 0x20002A51  uint8_t 
        _rfu3   '\0' (0x00) 0x20002A52  uint8_t 
    addr    <array>"ÇÊEe*t" 0x20002A53  uint8_t[6]  
        [0] 'Ç' (0xC7)  0x20002A53  uint8_t 
        [1] 'Ê' (0xCA)  0x20002A54  uint8_t 
        [2] 'E' (0x45)  0x20002A55  uint8_t 
        [3] 'e' (0x65)  0x20002A56  uint8_t 
        [4] '*' (0x2A)  0x20002A57  uint8_t 
        [5] 't' (0x74)  0x20002A58  uint8_t 
    payload <array>""   0x20002A59  uint8_t[31] 
        [0] '\0' (0x00) 0x20002A59  uint8_t 
        [1] '.' (0x16)  0x20002A5A  uint8_t 
        [2] 'ä' (0xE4)  0x20002A5B  uint8_t 
        [3] 'þ' (0xFE)  0x20002A5C  uint8_t 
        [4] '©' (0xA9)  0x20002A5D  uint8_t 
        [5] '*' (0x2A)  0x20002A5E  uint8_t 
        [6] '.' (0x03)  0x20002A5F  uint8_t 
        [7] '\0' (0x00) 0x20002A60  uint8_t 
        [8] 'u' (0x75)  0x20002A61  uint8_t 
        [9] 'a' (0x61)  0x20002A62  uint8_t 
        [10]    'Ð' (0xD0)  0x20002A63  uint8_t 
        [11]    '²' (0xB2)  0x20002A64  uint8_t 
        [12]    'â' (0xE2)  0x20002A65  uint8_t 
        [13]    '\f' (0x0C) 0x20002A66  uint8_t 
        [14]    'l' (0x6C)  0x20002A67  uint8_t 
        [15]    '.' (0x02)  0x20002A68  uint8_t 
        [16]    '.' (0x01)  0x20002A69  uint8_t 
        [17]    '.' (0x06)  0x20002A6A  uint8_t 
        [18]    '.' (0x02)  0x20002A6B  uint8_t 
        [19]    'Ø' (0xD8)  0x20002A6C  uint8_t 
        [20]    '.' (0x05)  0x20002A6D  uint8_t 
        [21]    '.' (0x01)  0x20002A6E  uint8_t 
        [22]    'Ù' (0xD9)  0x20002A6F  uint8_t 
        [23]    '>' (0x3E)  0x20002A70  uint8_t 
        [24]    '.' (0x03)  0x20002A71  uint8_t 
        [25]    'Ù' (0xD9)  0x20002A72  uint8_t 
        [26]    '>' (0x3E)  0x20002A73  uint8_t 
        [27]    '.' (0x03)  0x20002A74  uint8_t 
        [28]    '+' (0x2B)  0x20002A75  uint8_t 
        [29]    '*' (0x2A)  0x20002A76  uint8_t 
        [30]    'Õ' (0xD5)  0x20002A77  uint8_t 

p_adv_data  0x20002A59 (g_packet_pool + 89) R4  mesh_adv_data_t *   
    adv_data_length '\0' (0x00) 0x20002A59  uint8_t 
    adv_data_type   '.' (0x16)  0x20002A5A  uint8_t 
    mesh_uuid   65252   0x20002A5B  uint16_t    
    handle  10921   0x20002A5D  uint16_t    
    version 3   0x20002A5F  uint16_t    
    data    <array>"<snip>" 0x20002A61  uint8_t[23] 
        ...snip, see above...

Call stack: image

The BLE-address (c7 ca 45 65 2a 74) is unknown in my test setup. It is always the same, even though my test setup has about 12 nodes. Apparantly, this packet gets through all the error checks in the mesh code and ends up in the event queue with a data length of 249 because adv_data_length here is 0 and later the length is calculated like this in vh_rx:

evt.params.rx.data_len = p_adv_data->adv_data_length - MESH_PACKET_ADV_OVERHEAD;

Unfortunately, I haven't been able to trace this condition all the way back to radio_control (yet) to see if this comes out of the radio like this or it is memory corruption somewhere, because I'm not sure how to break the code there on this condition. However, because I can reproduce this easely and the packet index differs every time, I suspect it comes out of the radio like this.

So, my questions are; What could this packet be? What's up with the mysterious BLE address? How come the mesh puts this, seemingly invalid due to miscalculated length, packet into the event queue to be processed like nothing is wrong? How is the error checking done?

trond-snekvik commented 7 years ago

Looks like you may have some strange packets coming in.. Since your length field only includes the 0 at the beginning of the payload, I'm guessing the rest of the packet is uninitialized memory from previous uses of the same packet memory. Does the first bytes of payload ([0x75, 0x61, 0xd0...]) look familiar?

Having a single 0 byte AD data structure is technically legal according to the BLE spec, but shouldn't be produced by any mesh device. Could this be some other BLE device in your vicinity?

I'll mark this issue as a bug, as those packets shouldn't make it all the way to your application anyway. The correct fix is changing transport_control.c line 372 to if (p_mesh_adv_data != NULL && p_mesh_adv_data->adv_data_length >= MESH_PACKET_ADV_OVERHEAD), alternatively a similar fix here

twvd commented 7 years ago

Since we produce quite a few BLE devices nearby in the building it could very well be a non-mesh device producing this. I'll apply the fix you described.

I'm curious though - what happens to packets where the first byte does exceed MESH_PACKET_ADV_OVERHEAD? Is there any secondary error checking to confirm the packet is actually a valid mesh packet to prevent the bogus data propagating over the mesh?

trond-snekvik commented 7 years ago

The radio control module uses the hardware CRC check as per the BLE specification, to remove any on-air bitflip errors, which are quite common. We also search for the correct AD type, with a Nordic-owned 16 bit service UUID, and ensure that we don't go beyond the max length (causing overflow) over in mesh_packet.c.