NordicPlayground / nRF51-ble-bcast-mesh

Other
323 stars 121 forks source link

Why does advertizing address and GAP has anything to do mesh packets? #147

Open bayou9 opened 7 years ago

bayou9 commented 7 years ago

In function: mesh_packet_set_local_addr

uint32_t mesh_packet_set_local_addr(mesh_packet_t* p_packet)
{
#ifdef SOFTDEVICE_PRESENT
    ble_gap_addr_t my_addr;
    uint32_t error_code = sd_ble_gap_address_get(&my_addr);
    if (error_code != NRF_SUCCESS)
    {
        return error_code;
    }
    p_packet->header.addr_type = my_addr.addr_type;
    memcpy(p_packet->addr, my_addr.addr, BLE_GAP_ADDR_LEN);
#else
    memcpy(p_packet->addr, (uint32_t*) &NRF_FICR->DEVICEADDR[0], BLE_GAP_ADDR_LEN);
    p_packet->header.addr_type = NRF_FICR->DEVICEADDRTYPE;
#endif

    return NRF_SUCCESS;
}

We can clearly see that the function is trying to fetch either the 6 byte mac address in NRF_FICR->DEVICEADDR[n] register or local bluetooth address, I find it really weird, isn't the mesh built almost entirely on radio peripheral? What is advertising address doing in the whole thing? And I'm sure this is not the only place advertising gets involved? I know that you need to advertise to get discovered (by external devices like iPhone) but I do not believe two nodes communicate via advertising? What's going on?

Also, can the mac address be used to uniquely ID a node in the mesh net work?

Please help, thank you in advance!

trond-snekvik commented 7 years ago

Hi there,

The mesh messages are sent as standard BLE advertisement packets, in order to be spec-compliant. You can read more about the packet format here.

The mac address can be a unique ID for the node, this is correct :)

bayou9 commented 7 years ago

The mesh messages are sent as standard BLE advertisement packets

In other words, there is a possibility of air traffic collision, but we don't really have to worry too much about it because the GAP takes care of great deal of the problem (e.g. frequency hopping)?

Also, how wrong would I be, if I were to assume that I can extract the MAC address out of every packet received?

Last but not least, I'm completely baffled by reference count increase and decrease, for example, these functions: mesh_packet_ref_count_dec, mesh_packet_ref_count_inc, and also the g_packet_refs[RBC_MESH_PACKET_POOL_SIZE] array, why do we need a counter for how many times certain packets were referred? I failed to see the point.

thedjnK commented 7 years ago

Last but not least, I'm completely baffled by reference count increase and decrease, for example, these functions:

I'm pretty sure that's how the mesh knows which data is the most recent and is documented on https://github.com/NordicSemiconductor/nRF51-ble-bcast-mesh/blob/master/docs/how_it_works.adoc

bayou9 commented 7 years ago

Hello, are you saying that " uint8_t c; "

of typedef __packed_armcc struct {

uint32_t        t;              /* Absolute value of t. Equals g_trickle_time (at set time) + t_relative */

uint32_t        i;              /* Absolute value of i. Equals g_trickle_time (at set time) + i_relative */

uint32_t        i_relative;     /* Relative value of i. Represents the actual i value in IETF RFC6206 */

uint8_t         c;              /* Consistent messages counter */

} __packed_gcc trickle_t;

is being operated on by mesh_packet_ref_count_dec, mesh_packet_ref_count_inc? Interesting I failed to find how they are connected, I suppose I have to look harder?

trond-snekvik commented 7 years ago

In other words, there is a possibility of air traffic collision, but we don't really have to worry too much about it because the GAP takes care of great deal of the problem (e.g. frequency hopping)?

It's not so much GAP, as it is the Trickle algorithm.

You can definitely extract the address out of every mesh packet, by looking at the ble_adv_addr field in the rx event. HOWEVER: This address always corresponds to the device that relayed the message to us, not the originator of the message. Each device only ever sends packets with their own advertisement address, regardless of where they get the mesh message from.

You're not the first to be thrown off by the reference counting, it solves a problem that I wish we didn't have: The incoming packet payloads in the mesh are never copied anywhere - they all exist in the the exact memory location where the radio DMA wrote them upon reception. This means that the packets that are referenced in the handle storage module (the database of current handles), are the same memory locations that the radio is writing to or reading from. From time to time, the Trickle algorithm tied to each mesh handle times out, and the packet is queued up for radio transmission. This action is asynchronous, as the radio operates in the highest IRQ level, as enforced by the Softdevice Timeslot API. When this timeout comes, the version handler module fetches pointers to all packets with expired timeouts from the handle storage, enqueues the relevant packet into the radio queue, before the radio (asynchronously) picks them up and transmits them as soon as possible. While the packet is queued for transmission, the version handler may process packets that the radio received and queued up before. This processing can lead to some of the entries in the handle storage being replaced (it's a least recently used-cache, as described here), deallocating these packets. When this happens, we don't know if the radio is done transmitting the packets, so we can't free it all-together. Instead, we free the reference to the packet that the handle storage owns. There are similar problems related to the application event queue, which may also hold references to packets. The idea is the same as for the C++11 shared_ptr<> mechanism, but it's all based on manual labor in the various parts of the stack. It's a nasty piece of work to manage, but it solves the asynchronous multi-owner problem quite efficiently. In hindsight, the penalty for copying payloads is lower than the effort required to manage this system (especially with these small payloads), and if we were to do a complete overhaul, I think I would vote to scrap this system.

bayou9 commented 7 years ago

Hello Trond, sorry for the late reply. You answer got me one step closer to a more clear picture of the program. Now I understand why there is a "transmit_all_instance".

I will look into the "shared pointer" mechanism some time later, and I have to admit I'm completely unaware there's a mechanism actively tackling this issue, now I think about it, there certainy is a "take ownership" function somewhere.

The radio DMA exploit was nicely done, no point in doing it any other way, since it's not exactly a resource rich environment, we need to squeeze every bit of performance we can.

And again, thank you so much for your help.