NordicPlayground / nRF51-ble-bcast-mesh

Other
323 stars 121 forks source link

conflicts leading to hanging mesh #61

Closed vliedel closed 7 years ago

vliedel commented 8 years ago

Hi, i'm currently trying out the mesh with 6 nodes, of which 5 are regularly sending messages (about 100B each 10s). Sometimes i see a lot of conflicts coming by, resulting in some nodes not sending anything anymore. I attached gdb, but they didn't actually crash, so it's a bit hard to find out what happens here. I should probably start actually handling conflicts, but I'm wondering what could cause this.

daviddedwin commented 8 years ago

Looks like a zombie node, this is a current open issue on the release 0.7.1-SD8, but can be fixed by using the radio_control.c from the dfu_alpha branch.

vliedel commented 8 years ago

Ah, looks like I can indeed only update that file for now. I'll give it a go!

vliedel commented 8 years ago

Nope, this seems to mess up the messages i'm sending. Maybe because i increased the payload size?

daviddedwin commented 8 years ago

Can you verify on the default size ? We have it working and stable on a 25 node network.

vliedel commented 8 years ago

With the default size, it seems to work: no zombie nodes after a couple of hours. With the larger size, the default size plus 5 more bytes, were received correctly, but all 0 for the bytes after that. The size i'm using is: #define RBC_MESH_VALUE_MAX_LEN (100) #define BLE_ADV_PACKET_PAYLOAD_MAX_LENGTH (RBC_MESH_VALUE_MAX_LEN + 3) (again, i only copy pasted the new radio_control.c, left all other code as it is in the sdk 8 branch)

daviddedwin commented 8 years ago

Will take a look at it but a fix is likely only in Jan.

trond-snekvik commented 8 years ago

Hi @vliedel, I'm afraid there's a bit more to it for the packet-length fix. First off, the hardware configuration in radio_init must be changed: NRF_RADIO->PCNF0 defines the length of the three individual header fields, S0 (the type field in the packet format diagram), the length field, and S1, the padding bits after the length field.

As you can see at line 205 in radio_control.c, the length field is 6 bits (and consequently, the padding field S1 is 2 bits, to fill the byte). This is in line with the BLE standard. With a 6 bit length field, the longest packet we can send and receive is 63 bytes, which gives a max mesh-payload of 49 bytes (63 - adv-addr - mesh-header = 49). If we try to send mesh packets with a length that takes more than 6 bits to represent, the radio sees length % 64, and uses this new length. No bytes after this length goes on air.

So for your example, you have to increase the hardware length field to 7 bits, and reduce S1 to 1 bit, like so:

    NRF_RADIO->PCNF0 =  (
                          (((1UL) << RADIO_PCNF0_S0LEN_Pos) & RADIO_PCNF0_S0LEN_Msk)
                        | (((1UL) << RADIO_PCNF0_S1LEN_Pos) & RADIO_PCNF0_S1LEN_Msk) 
                        | (((7UL) << RADIO_PCNF0_LFLEN_Pos) & RADIO_PCNF0_LFLEN_Msk) 
                      );

This allows you to transmit a 113 byte payload for the mesh.

NOTE: your BLE_ADV_PACKET_PAYLOAD_MAX_LENGTH is off, it should be #define BLE_ADV_PACKET_PAYLOAD_MAX_LENGTH (RBC_MESH_VALUE_MAX_LEN + 8), not + 3. This is because the adv-packet payload size includes the entire mesh-header (AD-length, AD-type, Service UUID, Handle and Version in the packet diagram).

You can also do Len = 8 bits, S1 = 0 bits, but this has implications to the mesh-packet-struct, as I explained in this related devzone post. If you want 8-bit length (which gives you ~240 bytes of mesh-payload), you have to remove _rfu3 from ble_packet_header_t in mesh_packet.h. If 113 bytes is enough for you, I recommend not doing this, as it might cause additional complications.

vliedel commented 8 years ago

Right, so: BLE_ADV_PACKET_PAYLOAD_MAX_LENGTH should be: RBC_MESH_VALUE_MAX_LEN + MESH_PACKET_ADV_OVERHEAD + 1 And I noticed I should also change: (37UL) at NRF_RADIO->PCNF1 to (114UL), which is RBC_MESH_VALUE_MAX_LEN + MESH_PACKET_OVERHEAD. Is that correct?

It seems to work nicely now, still have to perform a longer test though ;) Thanks for the help, Ill post if it works for a long time.

trond-snekvik commented 8 years ago

Oh, yes, forgot that one. That's correct :)

Sounds great!

vliedel commented 8 years ago

Cool, so I ran a test with 29 nodes that were spread around the building. They're all sending those 100B messages, except for one node that I use as sink node. So far (after about 16h), no zombie nodes. I do get a lot of conflicts though, but I guess that's to be expected :)

Thanks again!

trond-snekvik commented 8 years ago

Great! As far as I know, yours is the first experiment with extensive payloads for the mesh, nice to hear it's working without extensive changes :smile:

For the conflicts, my only advice is to allocate handles for each device (or just pick random a handle for each, you'll probably be safe with 29 nodes), and treat them as source-addresses, having each device only push to their assigned handle.

vliedel commented 8 years ago

(I'm sorry if this is going off topic) We did some tests earlier with multiple handles and from what I remember it reduced the total "bandwidth", increased the memory usage, and also gave us more zombie nodes.

I guess it's worth a new try with this larger setup. I'm thinking of randomly assigning a handle, such that it stays scalable.

trond-snekvik commented 8 years ago

Any updates on this?

vliedel commented 8 years ago

We have a stable mesh it seems: 99B payload and only 2 handles. I also disable any handle that I receive but don't know of. I'm guessing the newer code helped out mostly :)

The test with the handle per node, I think I never did any more, sorry :P