ThomasGerstenberg / blatann

Python BLE library for using Nordic nRF52 with Connectivity firmware
BSD 3-Clause "New" or "Revised" License
72 stars 21 forks source link

Default preferred MTU Size too low #143

Closed mdxs closed 1 year ago

mdxs commented 1 year ago

When setting up a Peer connection with a BLE device, the preferred MTU Size is too low by default; when making the connection to the peripheral using:

# seeing the log driver comms shows the actual negotiation process
ble_device = BleDevice(serial_port, log_driver_comms=True)
ble_device.event_logger.suppress(nrf_events.GapEvtAdvReport)
ble_device.open()

# here the connect doesn't allow specification of the preferred MTU size...
peer = ble_device.connect(target_address).wait(5, exception_on_timeout=False)

# the following can be too late already...
peer.preferred_mtu_size = 247
# the following can be pointless if negotiation already settled to 23
peer.exchange_mtu(247).wait(10, exception_on_timeout=False)

logger.info("MTU Exchange complete (mtu={})".format(peer.mtu_size))

Sometimes it works, as I can set peer.preferred_mtu_size = 247 (see code above); or perform the peer.exchange_mtu(247) in time; but in some cases the BLE device is actually faster in initiating the exchange. If so (the client initiating quickly, and signalling 247 in my case), the negotiation handling by Blatann may result in the peer using the default peer.preferred_mtu size of 23.

In my opinion, this is due to https://github.com/ThomasGerstenberg/blatann/blob/fe2d85b3e39ff681bb43a0e98c3e41b5d6f06b83/blatann/peer.py#L68 using MTU_SIZE_DEFAULT = 23 imported from blatann.gatt

Am I missing a way to do this properly?

There is a connection_params opportunity in https://github.com/ThomasGerstenberg/blatann/blob/fe2d85b3e39ff681bb43a0e98c3e41b5d6f06b83/blatann/device.py#L280 but that doesn't handle the MTU preference, or does it?

mdxs commented 1 year ago

Log output of the case in which the BLE device is faster in negotiating the MTU size:

[2023-09-26 16:00:17,016] [MainThread] [root.main:143] [INFO]: Connecting to address C6:40:63:3D:0C:78,s
[2023-09-26 16:00:17,017] [MainThread] [blatann.nrf.nrf_driver.wrapper:64] [DEBUG]: [/dev/nrf52840] ble_gap_connect(BLEGapAddr(C6:40:63:3D:0C:78,s),)
[2023-09-26 16:00:17,017] [MainThread] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:       35 ->  [00 8c 01 02 78 0c 3d 63 40 c6 01 01 40 01 f0 00 0a 00 01 0c 00 18 00 00 00 90 01 01 ] type:     VENDOR_SPECIFIC reliable:yes seq#:4 ack#:4 payload_length:1c data_integrity:1 header_checksum:4d err_code:0x0
[2023-09-26 16:00:17,018] [Dummy-4] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:    30/ 0 <-  [N/A] type:                 ACK reliable: no seq#:0 ack#:5 payload_length:0 data_integrity:0 err_code:0x0
[2023-09-26 16:00:17,019] [Dummy-4] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:    31/ 0 <-  [01 8c 00 00 00 00 ] type:     VENDOR_SPECIFIC reliable:yes seq#:4 ack#:5 payload_length:6 data_integrity:1 header_checksum:a6 err_code:0x0
[2023-09-26 16:00:17,019] [Dummy-4] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:       36 ->  [N/A] type:                 ACK reliable: no seq#:0 ack#:5 payload_length:0 data_integrity:0 err_code:0x0
[2023-09-26 16:00:18,373] [Dummy-4] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:    32/ 0 <-  [02 10 00 00 00 02 78 0c 3d 63 40 c6 02 18 00 18 00 00 00 90 01 ] type:     VENDOR_SPECIFIC reliable:yes seq#:5 ack#:5 payload_length:15 data_integrity:1 header_checksum:b4 err_code:0x0
[2023-09-26 16:00:18,373] [Dummy-4] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:       37 ->  [N/A] type:                 ACK reliable: no seq#:0 ack#:6 payload_length:0 data_integrity:0 err_code:0x0
[2023-09-26 16:00:18,374] [/dev/nrf52840_Event] [blatann.device.on_driver_event:34] [DEBUG]: Got NRF Driver event: GapEvtConnected(conn_handle=0, peer_addr=BLEGapAddr(C6:40:63:3D:0C:78,s), role=<BLEGapRoles.central: 2>, conn_params=BLEGapConnParams(interval: [30.0-30.0] ms, timeout: 4000.0 ms, latency: 0))
[2023-09-26 16:00:18,374] [/dev/nrf52840_Event] [blatann.nrf.nrf_driver.wrapper:64] [DEBUG]: [/dev/nrf52840] ble_gap_addr_get()
[2023-09-26 16:00:18,374] [/dev/nrf52840_Event] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:       38 ->  [00 6d 01 ] type:     VENDOR_SPECIFIC reliable:yes seq#:5 ack#:6 payload_length:3 data_integrity:1 header_checksum:cd err_code:0x0
[2023-09-26 16:00:18,374] [Dummy-4] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:    33/ 0 <-  [02 55 00 00 00 f7 00 ] type:     VENDOR_SPECIFIC reliable:yes seq#:6 ack#:5 payload_length:7 data_integrity:1 header_checksum:94 err_code:0x0
[2023-09-26 16:00:18,374] [Dummy-4] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:       39 ->  [N/A] type:                 ACK reliable: no seq#:0 ack#:7 payload_length:0 data_integrity:0 err_code:0x0
[2023-09-26 16:00:18,375] [Dummy-4] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:    34/ 0 <-  [N/A] type:                 ACK reliable: no seq#:0 ack#:6 payload_length:0 data_integrity:0 err_code:0x0
[2023-09-26 16:00:18,375] [Dummy-4] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:    35/ 0 <-  [01 6d 00 00 00 00 02 6d 89 6f 43 a4 da ] type:     VENDOR_SPECIFIC reliable:yes seq#:7 ack#:6 payload_length:d data_integrity:1 header_checksum:2b err_code:0x0
[2023-09-26 16:00:18,376] [Dummy-4] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:       40 ->  [N/A] type:                 ACK reliable: no seq#:0 ack#:0 payload_length:0 data_integrity:0 err_code:0x0
[2023-09-26 16:00:18,377] [/dev/nrf52840_Event] [blatann.device.on_driver_event:34] [DEBUG]: Got NRF Driver event: GattsEvtExchangeMtuRequest(conn_handle=0, client_mtu=247)

=== NOTE: just above the peripheral indicated 247 would be fine,
===       to which Blatann seems to reply that 23 is to be used...

[2023-09-26 16:00:18,377] [/dev/nrf52840_Event] [blatann.nrf.nrf_driver.wrapper:64] [DEBUG]: [/dev/nrf52840] ble_gatts_exchange_mtu_reply(0, 23)
[2023-09-26 16:00:18,377] [MainThread] [root.main:155] [INFO]: Connected, conn_handle: 0

=== NOTE: only at this point is when I try (in code) to set:
===         peer.preferred_mtu_size = 247
===       so, that is too late in this case... 23 was already used

[2023-09-26 16:00:18,377] [/dev/nrf52840_Event] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:       41 ->  [00 ad 00 00 17 00 ] type:     VENDOR_SPECIFIC reliable:yes seq#:6 ack#:0 payload_length:6 data_integrity:1 header_checksum:cc err_code:0x0

=== NOTE: the info log entry here indicates that now I try:
===         peer.exchange_mtu(247).wait(10, exception_on_timeout=False)

[2023-09-26 16:00:18,377] [MainThread] [root.main:165] [INFO]: Exchanging MTU (requesting 247)

=== NOTE: but it appears the earlier (peripheral initiated) negotiation
===       now makes 23 stick... there is no 247 / 0xf7 tried to be negotiated IMO

[2023-09-26 16:00:18,377] [MainThread] [blatann.nrf.nrf_driver.wrapper:64] [DEBUG]: [/dev/nrf52840] ble_gattc_exchange_mtu_req(0, 23)
[2023-09-26 16:00:18,379] [Dummy-4] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:    36/ 0 <-  [N/A] type:                 ACK reliable: no seq#:0 ack#:7 payload_length:0 data_integrity:0 err_code:0x0
[2023-09-26 16:00:18,379] [Dummy-4] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:    37/ 0 <-  [01 ad 00 00 00 00 ] type:     VENDOR_SPECIFIC reliable:yes seq#:0 ack#:7 payload_length:6 data_integrity:1 header_checksum:9a err_code:0x0
[2023-09-26 16:00:18,379] [Dummy-4] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:       42 ->  [N/A] type:                 ACK reliable: no seq#:0 ack#:1 payload_length:0 data_integrity:0 err_code:0x0

=== NOTE: so in this case we seem to stick to 23
===       despite the BLE device being able to do 247

[2023-09-26 16:00:18,380] [/dev/nrf52840_Event] [blatann.peer._resolve_mtu_exchange:576] [DEBUG]: [0] MTU Exchange - Ours: 23, Peers: 247, Effective: 23
[2023-09-26 16:00:18,380] [MainThread] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:       43 ->  [00 9e 00 00 17 00 ] type:     VENDOR_SPECIFIC reliable:yes seq#:7 ack#:1 payload_length:6 data_integrity:1 header_checksum:c3 err_code:0x0
[2023-09-26 16:00:18,382] [Dummy-4] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:    38/ 0 <-  [N/A] type:                 ACK reliable: no seq#:0 ack#:0 payload_length:0 data_integrity:0 err_code:0x0
[2023-09-26 16:00:18,382] [Dummy-4] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:    39/ 0 <-  [01 9e 00 00 00 00 ] type:     VENDOR_SPECIFIC reliable:yes seq#:1 ack#:0 payload_length:6 data_integrity:1 header_checksum:d1 err_code:0x0
[2023-09-26 16:00:18,383] [Dummy-4] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:       44 ->  [N/A] type:                 ACK reliable: no seq#:0 ack#:2 payload_length:0 data_integrity:0 err_code:0x0
[2023-09-26 16:00:18,404] [Dummy-4] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:    40/ 0 <-  [02 23 00 00 00 fb 00 fb 00 48 08 48 08 ] type:     VENDOR_SPECIFIC reliable:yes seq#:2 ack#:0 payload_length:d data_integrity:1 header_checksum:60 err_code:0x0
[2023-09-26 16:00:18,404] [Dummy-4] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:       45 ->  [N/A] type:                 ACK reliable: no seq#:0 ack#:3 payload_length:0 data_integrity:0 err_code:0x0
[2023-09-26 16:00:18,404] [/dev/nrf52840_Event] [blatann.device.on_driver_event:34] [DEBUG]: Got NRF Driver event: GapEvtDataLengthUpdateRequest(conn_handle=0, max_tx_octets=251, max_rx_octets=251, max_tx_time_us=2120, max_rx_time_us=2120)
[2023-09-26 16:00:18,405] [/dev/nrf52840_Event] [blatann.nrf.nrf_driver.wrapper:64] [DEBUG]: [/dev/nrf52840] ble_gap_data_length_update(0,)
[2023-09-26 16:00:18,405] [/dev/nrf52840_Event] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:       46 ->  [00 90 00 00 00 00 ] type:     VENDOR_SPECIFIC reliable:yes seq#:0 ack#:3 payload_length:6 data_integrity:1 header_checksum:ba err_code:0x0
[2023-09-26 16:00:18,406] [Dummy-4] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:    41/ 0 <-  [N/A] type:                 ACK reliable: no seq#:0 ack#:1 payload_length:0 data_integrity:0 err_code:0x0
[2023-09-26 16:00:18,406] [Dummy-4] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:    42/ 0 <-  [01 90 00 00 00 00 00 ] type:     VENDOR_SPECIFIC reliable:yes seq#:3 ack#:1 payload_length:7 data_integrity:1 header_checksum:b7 err_code:0x0
[2023-09-26 16:00:18,407] [Dummy-4] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:       47 ->  [N/A] type:                 ACK reliable: no seq#:0 ack#:4 payload_length:0 data_integrity:0 err_code:0x0
[2023-09-26 16:00:18,434] [Dummy-4] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:    43/ 0 <-  [02 24 00 00 00 fb 00 fb 00 48 08 48 08 ] type:     VENDOR_SPECIFIC reliable:yes seq#:4 ack#:1 payload_length:d data_integrity:1 header_checksum:56 err_code:0x0
[2023-09-26 16:00:18,434] [Dummy-4] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:       48 ->  [N/A] type:                 ACK reliable: no seq#:0 ack#:5 payload_length:0 data_integrity:0 err_code:0x0
[2023-09-26 16:00:18,434] [/dev/nrf52840_Event] [blatann.device.on_driver_event:34] [DEBUG]: Got NRF Driver event: GapEvtDataLengthUpdate(conn_handle=0, max_tx_octets=251, max_rx_octets=251, max_tx_time_us=2120, max_rx_time_us=2120)
[2023-09-26 16:00:18,464] [Dummy-4] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:    44/ 0 <-  [02 3a 00 00 00 00 00 00 00 f7 00 ] type:     VENDOR_SPECIFIC reliable:yes seq#:5 ack#:1 payload_length:b data_integrity:1 header_checksum:75 err_code:0x0
[2023-09-26 16:00:18,464] [Dummy-4] [blatann.nrf.nrf_driver._log_message_handler:698] [INFO]: LOG [1]:       49 ->  [N/A] type:                 ACK reliable: no seq#:0 ack#:6 payload_length:0 data_integrity:0 err_code:0x0
[2023-09-26 16:00:18,464] [/dev/nrf52840_Event] [blatann.device.on_driver_event:34] [DEBUG]: Got NRF Driver event: GattcEvtMtuExchangeResponse(conn_handle=0, server_mtu=247)
[2023-09-26 16:00:18,464] [/dev/nrf52840_Event] [blatann.peer._resolve_mtu_exchange:576] [DEBUG]: [0] MTU Exchange - Ours: 23, Peers: 247, Effective: 23

=== NOTE: and this is reported in the scrip (using the logger.info call)

[2023-09-26 16:00:18,464] [MainThread] [root.main:168] [INFO]: MTU Exchange complete (mtu=23)
ThomasGerstenberg commented 1 year ago

Yeah that's a race condition, there might be a few other parameters too that would be good to include as well, I'll take a look

mdxs commented 1 year ago

Trying to understand the code; the following seems "dead code" to me:

https://github.com/ThomasGerstenberg/blatann/blob/fe2d85b3e39ff681bb43a0e98c3e41b5d6f06b83/blatann/device.py#L140

As it isn't used AFAIK; there is another approach that seems more complete (and is commented in the configure method documentation) at:

https://github.com/ThomasGerstenberg/blatann/blob/fe2d85b3e39ff681bb43a0e98c3e41b5d6f06b83/blatann/device.py#L182

Though I'm not sure how it is actually further used...

ThomasGerstenberg commented 1 year ago

Yes looks like line 140 is dead, likely a remnant from before I implemented configurable MTU sizes.

Best option is probably to add a preferred_mtu_size param to BleDevice.connect() and update the Peer/Peripheral constructors

mdxs commented 1 year ago

While I'm not sure, there seems to be an opportunity to use:

https://github.com/ThomasGerstenberg/blatann/blob/fe2d85b3e39ff681bb43a0e98c3e41b5d6f06b83/blatann/device.py#L182

as a way to use the existing BleDevice.configure() method to specify a maximum, as it appears to be added onto the default connection parameters, which could be picked up in the Peer(/Peripheral) constructors.

mdxs commented 1 year ago

At least something like that is used at:

https://github.com/ThomasGerstenberg/blatann/blob/fe2d85b3e39ff681bb43a0e98c3e41b5d6f06b83/blatann/device.py#L308

within the connect() method.

mdxs commented 1 year ago

However, I probably am poking around without knowing too much... and I agree with your proposal to add a preferred_mtu_size param to BleDevice.connect() and update the Peer/Peripheral constructors.

I wonder if you have an opinion on the best default for the preferred_mtu_size param.

I was on the path to set it to the max allowed value... aiming big.

But defaulting it to MTU_SIZE_DEFAULT in the Peer constructor would be closer to the current self._preferred_mtu_size = MTU_SIZE_DEFAULT seen in the Peer constructor; and while it is aiming small, it has the benefit of least like side effects to existing users of the library.

ThomasGerstenberg commented 1 year ago

Not quite, that sets the upper bound for the firmware to allocate for each connection. This may not necessarily be the preferred MTU for a single connection, thus a separate parameter is preferred.

The same reasoning can be applied to the queue sizes, there is no preferred queue size, it's just set to that, thus applied from the device's config. (Theoretically these queue sizes can also be tailored per-connection, but I didn't implmement that as it seems like a use-case for a very small margin of users for much added complexity.)

For backwards compatibility the default preferred MTU size will stay the same, but I can add another method in the device to set a global preferred MTU size if one is not specified (similar to setting the default security params)

mdxs commented 1 year ago

Sure, as said "I probably am poking around without knowing too much..."

I was looking into how I would test/confirm an implementation, and I think it would look like:

# seeing the log driver comms shows the actual negotiation process
ble_device = BleDevice(serial_port, log_driver_comms=True)
ble_device.event_logger.suppress(nrf_events.GapEvtAdvReport)
ble_device.open()

# specify the preferred MTU size when connecting to the peripheral
peer = ble_device.connect(target_address, preferred_mtu_size=247).wait(5, exception_on_timeout=False)

# mtu size negotiation may already have settled to 247 when initiated by the peripheral,
# so wait a little to provide an opportunity
time.sleep(2)
# if not yet negotiated to what is expected, try to (re-)initiate mtu size negotiation
if peer.mtu_size < 247:
    peer.exchange_mtu(247).wait(10, exception_on_timeout=False)

logger.info("MTU Exchange complete (mtu={})".format(peer.mtu_size))

Which at this moment (using Blatann v0.5.0) would result in the mtu_size being 23 instead of the 247 that my peripheral is able to handle.

The expected result after the changes would be for this to settle on mtu_size of 247. Whether it is the peripheral initiating the mtu negotiation/exchange (which my peripheral would do...) or that it is the script initiating the mtu negotiation/exchange (which with my peripheral would happen sometimes when I drop/comment the timesleep.sleep(2) line.

ThomasGerstenberg commented 1 year ago

That looks like a valid test case to me.

Also, here's a workaround for you until the API change is ready:

connect_waitable = ble_device.connect(target_address)  # no wait here, do it after the MTU is set
ble_device.connecting_peripheral.preferred_mtu_size = 247
# now wait for the connection
peer = connect_waitable.wait(5, exception_on_timeout=False)

This doesn't remove the race condition but reduces it greatly since it'll be updating the preferred MTU size during connection rather than after

mdxs commented 1 year ago

Cool, another workaround that works for me ;-)

I understand the race condition could still apply, which would leave me on mtu_size being 23, but this doesn't seem to happen in my current testing (so far...) with the time.sleep(2) there to try to have the peripheral trigger the negotiation/exchange.

So with the workaround I can proceed with further testing; thanks, much appreciated!

mdxs commented 1 year ago

Great to see that this has been implemented.