Open Scottapotamas opened 6 months ago
After a bit more testing I worked out that manually specifying connection parameters is needed to allow NimBLE to match Bluedroid's defaults.
// Sets the client's BLE connection behaviours
// https://mynewt.apache.org/latest/network/ble_hs/ble_gap.html#c.ble_gap_update_params
// ITVL uses 1.25 ms units
// Timout is in 10ms units
// CE LEN uses 0.625 ms units
// BLE specifies minimum 7.5ms connection interval
struct ble_gap_upd_params conn_parameters = { 0 };
conn_parameters.itvl_min = 6; // 7.5ms
conn_parameters.itvl_max = 24; // 30ms
conn_parameters.latency = 0;
conn_parameters.supervision_timeout = 20;
// https://github.com/apache/mynewt-nimble/issues/793#issuecomment-616022898
conn_parameters.min_ce_len = 0x00;
conn_parameters.max_ce_len = 0x00;
ble_gap_update_params(peer->conn_handle, &conn_parameters);
This improves results substantially.
Remaining comments for Espressif:
Have you verified the actual LL packet MTU is increased to 200?
I've found that using NimBLE on ESP32-S3 (other ESP32s untested) that the LL MTU is not increased. Calling ble_att_set_preferred_mtu()
and/or ble_gattc_exchange_mtu()
will only change the ATT layer MTU. The larger ATT packets will still be fragmented via L2CAP into 27 byte LL packets. Which is of course quite disastrous for performance.
To get larger LL packets, it's necessary to send an HCI command to increase the controller's connInitialMaxTxOctets (I tested this) or the connection's connMaxTxOctets (probably, I haven't tested) value.
Other BLE stacks I've used haven't required this. I think the flaw is in Espressif's controller implementation. I think it's expected to increase connMaxTxOctets in response to receiving a LL_LENGTH_REQ
PDU. It doesn't do this. The NimBLE controller (not used on ESP32) does this. The BT core spec (Ver 5.3, Vol 6, Part B, §5.1.9 "Data Length Update procedure") seems to imply the controller should do this.
@rahult-github thoughts?
@xyzzy42 Thanks for chiming in, your hint led me down the right path.
Sniffing the transfers, I see the 200 byte MTU update packets during connection, but for the larger transfers i.e. 128B, I still saw fragmented 26B LL transfers as you described:
I was able to resolve this issue by calling
#define LL_PACKET_TIME (2120)
#define LL_PACKET_LENGTH (200)
// ...
ble_hs_hci_util_set_data_len( event->connect.conn_handle, LL_PACKET_LENGTH, LL_PACKET_TIME );
inside BLE_GAP_EVENT_CONNECT
on successful connection.
This was needed on both the server and client boards. Wireshark trace shows a single packet for the 128B test (pictured below) and correctly used 6x single notification packets which matches the expected application-side 1024B chunking behaviour.
Here's a comparison between these different changes (HCI data also includes conn parameter changes)
With the HCI length call, the 1024B test is twice as quick (halved latency), with minimal improvements to the smaller one-packet sized tests. This brings it roughly in line with Bluedroid's default performance.
Updated comments for Espressif:
You can also call ble_gap_write_sugg_def_data_len()
before creating any connections to set the initial MTU length to be longer.
These two functions are totally missing in any documentation. Even things like Espressif's "How do I increase the MTU?" FAQ does not mention them.
In other BLE stacks I've used, this isn't necessary on both peers in the connection, as it is with ESP32+NimBLE. Only one, the GATT client, needs to send the length request. Then the other peer will increase the MTU in response to that.
I'm not entirely sure if it should be the NimBLE host or the Espressif controller which should be doing this. I think it's the controller. But I'm pretty sure a direct HCI call via a barely known function in application code is not the correct way.
Thanks for the suggestion.
Yeah I agree that calling against HCI functions from application space is a bad idea™.
The idiomatic NimBLE approach (mynewt-nimble source):
ble_gap_set_data_len()
wraps the ble_hs_hci_util_set_data_len()
function I hacked with above. I tested it and got the same behaviour.ble_gap_write_sugg_def_data_len()
doesn't seem to impact my test code and/or ESP-WROOM-32 at all.This was only a detail I found while testing NimBLE as a subset of other benchmarks, so I'm content with the 'fixed' results and can move on to other things.
I'd like to see an official statement/explanation and some improvements for future users though.
ble_gap_write_sugg_def_data_len()
doesn't seem to impact my test code and/or ESP-WROOM-32 at all.
Do you mean it does have a difference between using ble_gap_set_data_len()
or that it doesn't cause the MTU to increase?
I tested this on ESP32-S3 and ble_gap_write_sugg_def_data_len()
did work to increase the MTU for new connections. If I read the HCI spec correctly, it must be called before the connection is established.
I tested ble_gap_write_sugg_def_data_len()
in a few places (applied to both boards):
nimble_port_run()
and I still saw LL fragmentation in Wireshark captures. Might be a subtlety I'm missing there.
I'm calling after esp_nimble_hci_init() and nimble_port_run() and before ble_gap_adv_start(). Also I'm using ESP32-S3.
I wonder if this is a difference between the controller for the ESP32 vs the ESP32-S3? I also wonder, if the ESP32 doesn't support this HCI command, if it returns an error code? Since these commands never appear in any Espressif documentation, I doubt any difference in controller support between chips is documented either.
I found a new problem, which might explain some of the differences seen. Setting ble_gap_write_sugg_def_data_len()
after nimble_port_run, but before advertising starts worked to get a larger LL MTU sometimes, but not always.
I.e., it doesn't work when the client is an iPhone that is already bonded to the server (ESP32).
An examination of the packet capture shows than in the working cases, the LL_LENGTH_REQ packet is sent from the client to the server while the connection is still unencrypted. In case 2, the connection is encrypted after this request and in case 1 & 3 the connection remains unencrypted.
But in case 4, the connection is encrypted first, and then the LL_LENGTH_REQ packet is sent. It's the first packet sent after the encryption handshake finishes. Something in the Espressif controller does not like this, and responds to the request with LL_REJECT_EXT_IND, LMP PDU Not Allowed.
I don't know the encryption is actually related to this problem or not. This is happening inside the binary only controller code, so I can't debug it further. But encryption is the only obvious difference between the accepted MTU requests and the rejected one.
I then tried using ble_gap_set_data_len()
on the connection, after it's setup and the client (iPhone) initiated request has already failed. This generates a LL_LENGTH_REQ from the server and the phone accepts it and increases the LL MTU.
Answers checklist.
General issue report
I've been working on a series of latency benchmarks for different wireless radios/stacks, and measured some odd behaviours from NimBLE when compared to Bluedroid for GATT/SPP style transfers.
These measurements are for one-way transfer latency - no ack/response behaviour is implemented or measured.
Is this behaviour reasonable/expected for the NimBLE + ESP32 stack? Any suggestions?
Comparing server notify and client WriteNoResp is also inconsistent between Bluedroid and NimBLE:
Reproduction Notes
Software
Code for
esp32-spp
(classic),esp32-ble
,esp32-nimble
is on GitHub. These vary between minor changes from Espressif examples to heavier modifications to achieve feature parity.The biggest difference from IDF examples: I've removed UART bridge behaviour, test payloads are handled directly on device.
Where test payloads exceed MTU, I manually send them as smaller MTU sized packets where the benchmark task requires some kind of library level event to signal the next packet i.e.
BLE_GAP_EVENT_NOTIFY_TX
, similar to the approach used in Espressif throughput example.I originally ran these tests with
IDF 5.1.1
but can reproduce them with latestv5.3-dev-892-g692c1fcc52
which is ~3 days old.docker run -i --privileged --rm -v $PWD:/project -w /project -it espressif/idf:latest
Other relevant changes with menuconfig:
-O2
Test Setup
I've measured trigger-to-output overhead at ~4.11 μs when tested in a loopback configuration.
All firmware variants support both server notify and client writes, so swapping the trigger/valid signal connections allows testing client-server direction as needed.