Open thedjnK opened 7 years ago
We've seen this behavior when the connection interval goes below 10ms. This prevents the mesh from running, as the Softdevice is spending too much time on the radio. For the HW configuration on our devkits (with regards to the LFCLK drift), this problem doesn't happen in testing anymore, but the Softdevice may apply different margins based on the drift of your timers, which causes it to reject the mesh timeslot between connection events. I'm not familiar with smartbasic, and which role it plays here, could this affect your timeslot usage too?
Are you able to increase the connection intervals you're using, or otherwise confirm this theory, so that we can try to mitigate the problem? Which Softdevice versions and chips have you tested with?
I was thinking today about the connection interval as it's set to 7.5ms so it sounds like this is the problem, I'll try a larger interval sometime to see if it fixes it and report back. As this scenario can lead to an accidental or purposeful denial of service, would a better idea be to not allow connection intervals below a certain value? When the slave connects they can negotiate and use e.g. 15ms thus avoiding this problem.
That would be the best, yes. Unfortunately, we can't really enforce usage of the Softdevice from this framework, as the mesh doesn't interfere with those calls. The solution up until now has been to accept whatever time the Softdevice gives us, but as proven by your issue, this is error prone, and suspect to change with different hardware configurations. The way I see it, there are three options for the framework:
Perhaps a combination of the first and last option - we can't really test all kits out there, but we have accurate numbers on what sort of drift these devices can operate with according to the Bluetooth specification, and should be able to tune accordingly. Documenting the problem is never a bad idea either.
It looks like the connection interval was the problem, but it seems that even a 20ms interval isn't enough as it still won't propagate data however it works with a connection interval of 30ms. I agree it's not an easy problem to solve.
I've discovered a strange issue with the mesh network: I can communicate fine from android to android and android to iOS, however if I try from a Laird nRF5* module (with smartbasic) to either android or another Laird module the data does not propagate to the other device. I connect, enable notifications for the mesh value characteristic and then try writing to this characteristic, any time I write data e.g. 0000000122 I get a command success response on that module (110080) but the other module doesn't get the data. I've done a sniff and can see that the data isn't sent to the module (so is not a case of the module getting the data but not handling/displaying it) and have been trying to compare it to an android sniff but cannot see any differences aside from the GATT table listing performed by the android device. Both devices write the same data to the same handle IDs.
This sniff shows: both connect and enable notifications then one writes and gets the success message but the other node never receives the data, however the node that doesn't receive the data cannot write information to that node and instead gets an error back saying that the node is busy.
Node A (First to send): https://www.dropbox.com/s/vvji56hniawp1l7/Node_A_openmesh.pcapng?dl=0
Node B (Second to send which fails): https://www.dropbox.com/s/p8hf1hygrv6me3h/Node_B_openmesh.pcapng?dl=0