bluekitchen / btstack

Dual-mode Bluetooth stack, with small memory footprint.
http://bluekitchen-gmbh.com
Other
1.74k stars 618 forks source link

Classic HID host: unreliable pairing to some Android devices #607

Closed Slion closed 5 months ago

Slion commented 5 months ago

Pico W SDK 1.5.1

Various Android devices are pairing successfully and auto reconnect just fine through Interface, no issue for instance on Huawei P30 Pro (Android 10), HONOR Magic V2 (Android 14) or F(x)tec Pro¹ (Android 11).

However neither Samsung Galaxy Tab S6 (Android 12) nor Tab S8 Ultra (Android 14) could be paired through Interface. Both tablets exhibited slightly different behaviour. The S6 would not show a pairing confirmation prompt but it would still connect without pairing. The S8 would show a pairing confirmation and would connect successfully even though pairing actually failed. Both tablets having failed to pair with the Pico W are not able to auto reconnect once the connection is lost.

If I go through Android system Bluetooth settings to initiate the pairing it eventually succeeds. On both tablets, the first pairing attempt fails but a second attempt, shortly after the first one, hangs for a suspiciously long time and then succeeds. However at least the S8 still manages to lose the pairing after all and eventually keeps prompting to confirm pairing. Trying to pair from Honor Magic V2 Android Bluetooth settings also fails so I guess you are not suppose to do that, you need BluetoothHidDevice to be up and running.


Can the HID host somehow request different kind of pairing method? Thus I could try different ones see if some work better than others.

I'm getting those logs as usual:

16:52:24:087 -> SSP User Confirmation Request with numeric value '879295'
16:52:24:092 -> SSP User Confirmation Auto accept

When the host initiate the connection it fails straight away with L2CAP_CONNECTION_RESPONSE_RESULT_REFUSED_SECURITY. That seems to be the case for all Android devices running BluetoothHidDevice or not.

Came across raspberrypi/pico-sdk#1457 so I reset the flash but it did not help.

mringwal commented 5 months ago

Resetting the flash should help if the Pico W has a stored bonding, but the remote device has lost it.

You'll need to check the logs for details. You can test hid_host on a desktop, which directly provides HCI log files, see port/libusb for using an USB Bluetooth on Mac/Linux.

You can configure your IO Capabilities via gapssp* in src/gap.h

Slion commented 5 months ago

You can configure your IO Capabilities via gapssp* in src/gap.h

Thanks, tried the following but still no joy.

    gap_ssp_set_io_capability(SSP_IO_CAPABILITY_DISPLAY_YES_NO);
    gap_ssp_set_auto_accept(true);

Same with SSP_IO_CAPABILITY_NO_INPUT_NO_OUTPUT.

Slion commented 5 months ago

Resetting the flash should help if the Pico W has a stored bonding, but the remote device has lost it.

Mentioned that in edits above, it did not help.

Slion commented 5 months ago

I've updated the first post with my latest findings. The situation is not as bad as I initially thought since I eventually found a workaround to pair with those problematic devices. Still I'm planning to provide full logs at some point to enable a thorough investigation and hopefully a fix.

Slion commented 5 months ago

Logs from a failed pairing with Samsung Galaxy Tab S8 initiated from Interface - connection succeeds but pairing fails: tab-s8-pairing-fail.zip

Slion commented 5 months ago

Logs from a successful pairing with Honor Magic V2 initiated from Interface: magic-v2-proper-pairing.zip

Slion commented 5 months ago

You'll need to check the logs for details. You can test hid_host on a desktop, which directly provides HCI log files, see port/libusb for using an USB Bluetooth on Mac/Linux.

Not sure what you mean by this. I usually develop on Windows but I have a Linux workstation I use mostly for building Android.

Slion commented 5 months ago

How can I setup the Pico W so that it will send pairing request with numeric comparison? My hope is that this will behave more like the PC and should work.

Slion commented 5 months ago

How can I setup the Pico W so that it will send pairing request with numeric comparison?

So using SSP_IO_CAPABILITY_NO_INPUT_NO_OUTPUT I would get the simple pair confirmation. Now using SSP_IO_CAPABILITY_DISPLAY_YES_NO I get the numeric confirmation but otherwise it behaves all the same, no improvements. Pairing from Tab S8 triggers numeric confirmation but eventually fails though connection works. Pairing from Pico W fails early with L2CAP_CONNECTION_RESPONSE_RESULT_REFUSED_SECURITY, no pairing prompt, no connection.

    gap_ssp_set_io_capability(SSP_IO_CAPABILITY_DISPLAY_YES_NO);
    gap_ssp_set_auto_accept(true);
    gap_secure_connections_enable(true);
Slion commented 5 months ago

I'm not sure what the pklg files bring over the text files. Looking at text files differences between the Magic V2 successful pairing and the Tab S2 failed one, without knowing much about the protocols involved they look very similar. One thing stands out though on the Tab S8 side, it says: sm.c.4787: Unexpected PDU 1 in state 82.

No idea what that is though or if it could be relevant somehow😁

Slion commented 5 months ago

Yeah ok that's our issue here:

            log_info("Unexpected PDU %u in state %u", packet[0], sm_conn->sm_engine_state);
            sm_pdu_received_in_wrong_state(sm_conn);

And looking at that function implementation we indeed recognize a pairing error:

static inline void sm_pdu_received_in_wrong_state(sm_connection_t * sm_conn){
    sm_pairing_error(sm_conn, SM_REASON_UNSPECIFIED_REASON);
}

State 82 is SM_BR_EDR_W4_ENCRYPTION_COMPLETE. Not sure what PDU one is though, it's basically packet[0] which is sm_pdu_code.

Here is the last packet we received in the logs before that error: ACL <= 0B 20 0B 00 07 00 07 00 01 00 00 20 10 07 07

Slion commented 5 months ago

Turns out handling of SM_BR_EDR_W4_ENCRYPTION_COMPLETE comes only when config defines ENABLE_CROSS_TRANSPORT_KEY_DERIVATION.

So I added the following to my config:

#define ENABLE_CROSS_TRANSPORT_KEY_DERIVATION
#define ENABLE_LE_SECURE_CONNECTIONS

Sadly that only took me one step further to sm.c.4787: Unexpected PDU 1 in state 83 which is SM_BR_EDR_INITIATOR_W4_FIXED_CHANNEL_MASK which is never handle in that switch.

Slion commented 5 months ago

To be confirmed over time but it looks like I found a workaround. If I turn off secure connections the pairing is successful it seems. gap_secure_connections_enable(false);

Though if the connection is not secure it's a major concern when we are talking about keyboard input so I would need to understand how unsecure is that exactly.

mringwal commented 5 months ago

Hi. Could you try the current version of BTstack on the develop branch? We've fixed something that sounds similar. If that doesn't fix your issue, please capture the full HCI trace incl. debug output and convert it into a .pklg file with tool/create_packet_log.py and post it here.


The HCI log is the ground truth, while the debug output can help to understand the internal state. It's best to have both, but if I had to choose, I would go with only the HCI trace, as the stack behaviour can be reconstructed from that while the debug output only show that something might have gone wrong.


With LE, you've got two security levels: LE Secure Connection with MITM protection, by entering the passphrase in the keyboard or numeric comparison on two systems - that's actually secure. Any other option is also secure if there's no attacker present during initial pairing. if there's one present, it can either set-up a man-in-the middle (LE Secure Connections without MTIM protection), or crack the link key directly (LE Legacy Pairing).

Slion commented 5 months ago

Could you try the current version of BTstack on the develop branch?

Can I just replace the one from the Pico SDK with this one?

Shall I enable the following configuration options:

#define ENABLE_CROSS_TRANSPORT_KEY_DERIVATION
#define ENABLE_LE_SECURE_CONNECTIONS
Slion commented 5 months ago

please capture the full HCI trace incl. debug output and convert it into a .pklg

I thought that's what I did above. Was the debug output missing? Not sure how to enable it.

mringwal commented 5 months ago

Can I just replace the one from the Pico SDK with this one?

Yes. You might need to fix the CMake list of build files however.

You can also test on desktop system with an USB Bluetooth dongle instead as the stack behaves more or less identical on all platforms.

Shall I enable the following configuration options:

Yes, please enable these.

mringwal commented 5 months ago

I thought that's what I did above. Was the debug output missing? Not sure how to enable it.

Almost. It looks like the textual log looks like this: 08:02:37:153 -> [00:00:29.695] CMD => 35 0C 05 01 0B 00 01 00

Could you remove the first timestamp? The conversion tool expects the line to start with [xx:xx:xx.xxx] ...

In any case, I'm interested in the log when running the current version of develop for analysis (if the issue is still present)

Slion commented 5 months ago

With LE, you've got two security levels: LE Secure Connection with MITM protection, by entering the passphrase in the keyboard or numeric comparison on two systems - that's actually secure. Any other option is also secure if there's no attacker present during initial pairing. if there's one present, it can either set-up a man-in-the middle (LE Secure Connections without MTIM protection), or crack the link key directly (LE Legacy Pairing).

This is classic Bluetooth though those Samsung devices they seem to involve BLE somehow. Something to do with that cross transport key feature maybe. May or may not be related to this: https://github.com/bluez/bluez/issues/810

mringwal commented 5 months ago

Oh. Missed that. gap_secure_connections_enable(false) disables BR/EDR Secure Connections, which by itself isn't less secure (at least I haven't read otherwise). However, without BR/EDR Secure Connections, Cross Transport Key Derivation is not possible, which most likely avoids the bug you've run into. It should have been fixed in newer versions of the stack.

Slion commented 5 months ago

Could you remove the first timestamp?

I'll double check that next time around. It's just noise from the disconnect at the start of the session I think.

Can you confirm that all what's needed for your logs is the following or do I need to enable the debug log somehow on top of the HCI logs?

target_compile_definitions(picow_bt_example_common INTERFACE
    #WANT_HCI_DUMP=1 # This enables btstack debug
    )
mringwal commented 5 months ago

Your log looks good besides the unexpected additional timestamp in each line. The line above should look like this: [00:00:29.695] CMD => 35 0C 05 01 0B 00 01 00

I don't work on the Pico W currently and don't know the details.

Slion commented 5 months ago

Oh. Missed that.

Sorry for spamming you with info on that issue but I had to take a deep dive in your code to get to the bottom of it 🪠 Thankfully it looks like it's paying off 🥳

Slion commented 5 months ago

Your log looks good besides the unexpected additional timestamp in each line.

Ho silly me, that's the timestamp from the vscode serial monitor 😏

Slion commented 5 months ago

I can confirm ENABLE_CROSS_TRANSPORT_KEY_DERIVATION and ENABLE_LE_SECURE_CONNECTIONS are not needed when gap_secure_connections_enable(false);, pairing from the Tab S8 just works simply by disabling those secure connections which isn't less secure apparently.

Slion commented 5 months ago

I've had another go at pairing from another Tab S8 Ultra to another Pico W with gap_secure_connections_enable(false); and without ENABLE_CROSS_TRANSPORT_KEY_DERIVATION and ENABLE_LE_SECURE_CONNECTIONS. It failed, connection worked but pairing failed. So it looks like this still needs further investigation. Not sure when I'll get around it.

mringwal commented 5 months ago

Please always upload HCI logs. It should work with gap_secure_connections_enable(true); and ENABLE_CROSS_TRANSPORT_KEY_DERIVATION and ENABLE_CROSS_TRANSPORT_KEY_DERIVATION.

Slion commented 5 months ago

Please always upload HCI logs. It should work with gap_secure_connections_enable(true); and ENABLE_CROSS_TRANSPORT_KEY_DERIVATION and ENABLE_CROSS_TRANSPORT_KEY_DERIVATION.

I would need to try the development branch too. I'm still using the btstack version from the Pico W SDK.

peterharperuk commented 5 months ago

The pico-sdk dev branch recently updated btstack if that's any help

Slion commented 5 months ago

The pico-sdk dev branch recently updated btstack if that's any help

Thanks for the heads up. Here is the commit. Looks like a dependency to some Bluedroid codec was added too. Might be easier for me to just wait for the next Pico SDK release.

Slion commented 5 months ago

I could try btstack v1.6.1 both with and without secure connection and ENABLE_CROSS_TRANSPORT_KEY_DERIVATION and ENABLE_LE_SECURE_CONNECTIONS I have the same issues. I'll see if I can test the develop branch.

Slion commented 5 months ago

The develop branch has the same pairing issues and it also crashes the board somehow soon after connecting.

Slion commented 5 months ago

Here are the logs from v1.6.1 failing to pair, I nuked the Pico W flash before that recording too: v1.6.1-tab-s8-pairing-fail.zip

I still get that Unexpected PDU 1 in state 83 same as mentioned above.

Also the Bluetooth stack on the tablet is somehow messed up after that failed pairing, discovery ain't working until I turn Bluetooth off and back on. One of my Pico W did manage to pair with that tablet at some point. Back then I thought it was because I turned off secure connections but I can't reproduce it now.

Slion commented 5 months ago

I tried disabling BLE to test if that improved our pairing somehow but it did not change anything.

Slion commented 5 months ago

The develop branch has the same pairing issues and it also crashes the board somehow soon after connecting.

I know why it crashes. I need to adjust for #602.

mringwal commented 5 months ago

Please share a HCI log that shows the "Unexpected PDU 1 in state xx" when using the develop branch.

Slion commented 5 months ago

There it is: 664b08a-tab-s8-pairing-fail.zip

mringwal commented 5 months ago

Thanks. First impression: the error indicates that the SM is in state SM_BR_EDR_INITIATOR_W4_FIXED_CHANNEL_MASK, which is wrong, as it is not "Initiator" in your log, it should be Responder. (And yes, receiving a pairing request as initiator would be wrong). Stay tuned...

mringwal commented 5 months ago

Ah... there's a Classic Role change before the connection is fully opened and BTstack stores the current role when it receives the Connection Complete event. That explains why BTstack assumes Initiator role. The question now is, what's correct here? Which side is the initiator for the SM pairing, if it is triggered after a Classic role change....

mringwal commented 5 months ago

Do you configure Android to become Peripheral in your Keyboard simulator? If yes, could you try once to stay Central and see if the pairing works as expected?

Slion commented 5 months ago

I'm using that BluetoothHidDevice API which is fairly high level. I don't think I have control over the role Peripheral or Central. I'll take a closer look though. It's possibly worth noting that connection and pairing initiated from the Pico W are simply always not working with any Android device. That's a topic for another issue though, see #612.

I'm thinking the role change is to be expected in this rather unusual scenario where the device initiate the connection with the host.

mringwal commented 5 months ago

I'm not able to get an answer from the Bluetooth Core v5.4 spec about which side should send the SM Pairing Request after a Role Change. A quick test indicates that iOS16 expects the device in Central role to send it while Android 14 expects the device that initiated the connection to do so.

Could you test the `develop-sm-role-change' branch and post the HCI log? https://github.com/bluekitchen/btstack/commit/dc3e249654620a980c173e4a8370c163482deec6

In this commit, BTstack tries to switch into Peripheral Role when it receives a Pairing Request although it was in Central Role when the pairing has started.

I'm not happy with this, as it there's a clear race condition between both sides sending a SM Pairing Request. In your log, BTstack is still waiting for the result from the L2CAP Information Request which happens after Android sent the SM Pairing Request, but there's no guarantee for that and the question would be how a collision (both sides sends SM Pairing Request at the same time) should/could be handled.

Slion commented 5 months ago

Looks like it worked: dc3e24965-tab-s8-pairing-fail.zip

So far I did two tries, one without nuking the flash or recording the logs and it did not work. A second one with HCI logs after nuking the flash and, to my surprise, pairing was completed successfully.

Slion commented 5 months ago

Trying to reproduce that surprising success failed. I tried pairing again after unpairing from the Android tablet and that failed. I nuked the flash and tried again and it failed. Here are the logs from that failed attempt from clean flash:

dc3e24965-tab-s8-pairing-real-fail.zip

I'm not happy with this, as it there's a clear race condition between both sides sending a SM Pairing Request.

Maybe that's indeed what's happening here and that's why sometimes it works but mostly it does not. Surely there is a way to sort it out though. Pairing with Windows PC is reliable for instance. Is there a way to capture logs from pairing with a PC? Would that help? I had in mind to implement an HCI Bluetooth USB dongle with a Pico W that could easily be used to capture such logs. In theory it's fairly easy to do with TinyUSB, not sure if I can easily forward the HCI commands either through BTstack or to the driver directly.

mringwal commented 5 months ago

Thanks for the logs. Both logs go over the initial issue, so that's good. One shows a complete pairing process while the other doesn't change state and reports pairing as failed. I'll retry with my Android 14 phone tomorrow to see if I can reproduce it.

You Windows PC most likely either does not support BR/EDR Secure Connections as only a few newer USB Bluetooth Dongles support it (but it would be visible in the HCI log). This might help to get HCI traces on Windows: https://learn.microsoft.com/en-us/windows-hardware/drivers/bluetooth/testing-btp-tools-btvs (Again, only if you're curious, I hope that your issue reproduces against my Pixel 7a).

Slion commented 5 months ago

I'll retry with my Android 14 phone tomorrow to see if I can reproduce it.

So far I could only reproduce the issue with Samsung tablets as mentioned in the first post. Even Lineage OS worked fine so I'm guessing it could be Samsung specific. I could test a Samsung phone as I have not done it yet. You may not be able to reproduce it with your Pixel. You should be able to download that Interface app from the Play Store and use it for testing. It will disconnect after 5mn I believe unless you buy a subscription. If you want I could also add you to the testing group so you can "buy" free fake subscriptions. I ought to release a new version though as I have made a few useful changes to the Bluetooth menu.

However the issue might be reproducible without the app simply by trying to pair from the Bluetooth settings.

Slion commented 5 months ago

One shows a complete pairing process while the other doesn't change state and reports pairing as failed.

Yes, this is consistent with what happened. The first one was a success even though the file is named "-fail". I did not realize it was a success until I was posting it. Apologies about the confusing naming.

The second one, named "-real-fail" was indeed a failed pairing.

Slion commented 5 months ago

I just tried with a Samsung Galaxy A22 phone and same issue. Connection is working but pairing failed. That's indeed very much a Samsung specific issue so far.

That brings me to my next point about Bluetooth proper vocabulary. I find those definitions for pairing and bonding.

Pairing: The process of generating, distributing, and authenticating keys for encryption purposes. Bonding: The process of pairing followed by distribution of keys used to encrypt the link in future reconnections.

Assuming those definitions are valid for both Classic and Low Energy, I believe this issue is in fact about bonding failing but pairing still working. Is that correct?

Slion commented 5 months ago

I just tried with a Samsung Galaxy A22 phone and same issue. Connection is working but pairing failed. That's indeed very much a Samsung specific issue so far.

I tried that again and bonding worked this time. Also with the tablet I could get successful bonding again. So far with your patch I have between 25% and 50% chance to get a successful bonding.

mringwal commented 5 months ago

Your logs show that the initial step, Android sending a Pairing Request when we wait to send it ourselves a bit later, is working now. But there's a different issue/bug that causes the fails. I can observe a similar behaviour on the Pixel 7 / Android 14, but with the patch, it worked 3 times out of 3.

Could you send me 3 more logs where pairing fails for you?

Terminology: I'm usually happy if people don't mix connect and pairing/bonding. In this case, the issue is with the last step of the LE pairing/bonding part, where the device exchange their real Bluetooth address. The LE Long Term keys are generated as part of the Cross-Transport Key Derivation, where both sides calculate the LE key from the Link Key generated in the Classic (BR/EDR) pairing. As this happens locally on each side, the LE key isn't actually transferred over the air...