espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
12.99k stars 7.12k forks source link

Bluetooth Class of Device should never be overridden by the stack once set by the user (IDFGH-12236) #13286

Open ftab opened 4 months ago

ftab commented 4 months ago

Answers checklist.

IDF version.

v5.2-dev-2305-g968ce2efc2

Espressif SoC revision.

ESP32-D0WD-V3

Operating System used.

Linux

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

None

Development Kit.

all

Power Supply used.

USB

What is the expected behavior?

Should not override the class of device I set with esp_bt_gap_set_cod

What is the actual behavior?

It overrides the major/minor class of device at bta_jv_enable, causing the connected device to sometimes see the device as unclassified, causing UI inconsistency

Steps to reproduce.

  1. Call esp_bt_gap_set_cod() to initialize CoD to 0x240414
  2. Call esp_spp_init() or esp_spp_enhanced_init()
  3. bta_jv_enable() overrides last two bytes of CoD to 0x1f00 (= 0x241f00)
  4. Call esp_bt_gap_set_cod() again to force CoD back to 0x240414
  5. Connecting device sometimes sees the erroneous class instead of the one I set

I added a trace to BTM_SetDeviceClass to see exactly when it's getting called and with what class of device, and using the SPP acceptor example with this trace enabled, I see that there is a period of time after esp_spp_init() is called where bta_jv_enable() is called which then calls utl_set_device_class and overrides the last two bytes of CoD. This is a known behavior since 2022. See https://github.com/espressif/esp-idf/issues/10113#issuecomment-1305161791

More Information.

The suggestion as seen in the linked issue of setting class of device at ESP_SPP_INIT_EVT still leaves a period of time where the class of device is wrong and could be picked up as such by the connecting device. Instead it should never be overridden once set by the user in esp_bt_gap_set_cod(). This includes in bta_av_api_register() where the user may have specified the class of device already before calling this code and it would be improper to override it to something else. It would be acceptable to set it if it was never set by the user, but only then.

xiongweichao commented 4 months ago

Hi @ftab ,

Please try the following method. Please let me know if the problem still exists.

esp_bt_gap_set_cod()
esp_bt_gap_set_scan_mode(ESP_BT_CONNECTABLE, ESP_BT_GENERAL_DISCOVERABLE);

Thanks

ftab commented 4 months ago

Hi @ftab ,

Please try the following method. Please let me know if the problem still exists.


esp_bt_gap_set_cod()

esp_bt_gap_set_scan_mode(ESP_BT_CONNECTABLE, ESP_BT_GENERAL_DISCOVERABLE);

Thanks

@xiongweichao yes, this is the pattern I followed. The class of device gets overridden to 0x241f00 after becoming discoverable, and the connecting device sometimes gets this class instead of the one I set before becoming discoverable. Even if I have a task to periodically reset the class of device back to what I want every second. Even if I set the class back at ESP_SPP_INIT_EVT. I had to remove the call to utl_set_device_class from bta_jv_enable in order to get around it. In addition, it takes a long time for the class to be discovered at all, no matter what it is. Maybe that's for a different issue

Observed with a Linux computer with a Bluetooth adapter, and bluetoothctl

Process:

[bluetooth]# scan on Discovery started ... [NEW] Device 80:64:6F:FB:D8:FE [my device name] ... [bluetooth]# pair 80:64:6F:FB:D8:FE ... [bluetooth]# connect 80:64:6F:FB:D8:FE ... [CHG] Device 80:64:6F:FB:D8:FE Class: 0x00241f00

xiongweichao commented 4 months ago

Steps to reproduce.

  1. Call esp_bt_gap_set_cod() to initialize CoD to 0x240414
  2. Call esp_spp_init() or esp_spp_enhanced_init()
  3. bta_jv_enable() overrides last two bytes of CoD to 0x1f00 (= 0x241f00)
  4. Call esp_bt_gap_set_cod() again to force CoD back to 0x240414
  5. Connecting device sometimes sees the erroneous class instead of the one I set

@ftab May I ask if this step can be reproduced? I tried this reproduction step and the problem is not reproduced.

ftab commented 4 months ago

@xiongweichao

Our customer is developing a Linux system to connect to our device. Their UI requires that the class of device be correct and stable. Basically, what's happening is BlueZ is inquiring during the scanning process or upon connect, and in the stock version of esp-idf v5.1.2, sometimes (during the window of time between when SPP messes with the class of device and when the application code resets it) sees the class of device as the one SPP/bta_jv_enable reset it to (0x241f00) instead of the one the user application set it to (0x240414), and thereby appearing incorrect in the UI and the application not knowing for sure how to deal with the device (is it a phone? is it a speaker? No clue now...). There are a couple of additional application behaviors that may contribute, such as my custom application resetting discoverable depending on the state of the connection, so I have mirrored this in my modification of the example.

I have checked out v5.1.2 of esp-idf for the purposes of this demo

Here is an example modification of bt_spp_acceptor that illustrates the issue when paired & scanning with a Linux PC.

diff --git a/examples/bluetooth/bluedroid/classic_bt/bt_spp_acceptor/main/main.c b/examples/bluetooth/bluedroid/classic_bt/bt_spp_acceptor/main/main.c
index ecf70a8dfa..36af0d5d48 100644
--- a/examples/bluetooth/bluedroid/classic_bt/bt_spp_acceptor/main/main.c
+++ b/examples/bluetooth/bluedroid/classic_bt/bt_spp_acceptor/main/main.c
@@ -29,6 +29,7 @@
 #define SPP_SHOW_DATA 0
 #define SPP_SHOW_SPEED 1
 #define SPP_SHOW_MODE SPP_SHOW_SPEED    /*Choose show mode: show data or speed*/
+#define BT_COD 0x240414

 static const esp_spp_mode_t esp_spp_mode = ESP_SPP_MODE_CB;
 static const bool esp_spp_enable_l2cap_ertm = true;
@@ -63,6 +64,28 @@ static void print_speed(void)
     time_old.tv_usec = time_new.tv_usec;
 }

+static void set_discoverable_connectable(bool on)
+{
+    if (on) {
+        esp_bt_gap_set_scan_mode(ESP_BT_CONNECTABLE, ESP_BT_GENERAL_DISCOVERABLE);
+    }
+    else {
+        esp_bt_gap_set_scan_mode(ESP_BT_NON_CONNECTABLE, ESP_BT_NON_DISCOVERABLE);
+    }
+}
+
+static void set_cod(void)
+{
+    esp_bt_cod_t cod;
+
+    cod.major = esp_bt_gap_get_cod_major_dev(BT_COD);
+    cod.minor = esp_bt_gap_get_cod_minor_dev(BT_COD);
+    cod.service = esp_bt_gap_get_cod_srvc(BT_COD);
+
+    ESP_LOGI(SPP_TAG, "Setting class of device");
+    ESP_ERROR_CHECK_WITHOUT_ABORT(esp_bt_gap_set_cod(cod, ESP_BT_INIT_COD));
+}
+
 static void esp_spp_cb(esp_spp_cb_event_t event, esp_spp_cb_param_t *param)
 {
     char bda_str[18] = {0};
@@ -81,17 +104,23 @@ static void esp_spp_cb(esp_spp_cb_event_t event, esp_spp_cb_param_t *param)
         break;
     case ESP_SPP_OPEN_EVT:
         ESP_LOGI(SPP_TAG, "ESP_SPP_OPEN_EVT");
+        set_discoverable_connectable(0);
         break;
     case ESP_SPP_CLOSE_EVT:
         ESP_LOGI(SPP_TAG, "ESP_SPP_CLOSE_EVT status:%d handle:%"PRIu32" close_by_remote:%d", param->close.status,
                  param->close.handle, param->close.async);
+        set_discoverable_connectable(1);
         break;
     case ESP_SPP_START_EVT:
         if (param->start.status == ESP_SPP_SUCCESS) {
             ESP_LOGI(SPP_TAG, "ESP_SPP_START_EVT handle:%"PRIu32" sec_id:%d scn:%d", param->start.handle, param->start.sec_id,
                      param->start.scn);
-            esp_bt_dev_set_device_name(EXAMPLE_DEVICE_NAME);
-            esp_bt_gap_set_scan_mode(ESP_BT_CONNECTABLE, ESP_BT_GENERAL_DISCOVERABLE);
+            set_discoverable_connectable(0);
+            vTaskDelay(pdMS_TO_TICKS(5000));
+            set_discoverable_connectable(1);
+            vTaskDelay(pdMS_TO_TICKS(5000));
+            set_cod();
+            vTaskDelay(pdMS_TO_TICKS(5000));
         } else {
             ESP_LOGE(SPP_TAG, "ESP_SPP_START_EVT status:%d", param->start.status);
         }
@@ -243,6 +272,10 @@ void app_main(void)
         return;
     }

+    set_cod();
+    esp_bt_dev_set_device_name(EXAMPLE_DEVICE_NAME);
+    set_discoverable_connectable(1);
+
     esp_spp_cfg_t bt_spp_cfg = {
         .mode = esp_spp_mode,
         .enable_l2cap_ertm = esp_spp_enable_l2cap_ertm,

If I immediately set_cod() at ESP_SPP_START_EVT before becoming discoverable, then in this example there is only a ~30ms window where the class is wrong, which makes it harder to make it happen (but not impossible). So you will notice this example exaggerates the window of time in which the class is wrong with delays. This is closer to my custom application's behavior. A lot of code is running at the same time during SPP init, and it causes sometimes a second or two where the class of device is set wrong, ultimately requiring me to use this patch to resolve the problem by removing setting class of device from bta_jv_enable.

Here is how I observe the issue in bluetoothctl with the above example, while in another window resetting the ESP32-DevKitC, or in another window, running an application that connects to the SPP from the Linux PC that is paired to it:

[bluetooth]# scan on
...
[NEW] Device A8:03:2A:EC:0E:D2 ESP_SPP_ACCEPTOR
...
[bluetooth]# info A8:03:2A:EC:0E:D2 
Device A8:03:2A:EC:0E:D2 (public)
    Name: ESP_SPP_ACCEPTOR
    Alias: ESP_SPP_ACCEPTOR
    Class: 0x00240414
    Icon: audio-card
    Paired: no
    Trusted: no
    Blocked: no
    Connected: no
    LegacyPairing: no
    RSSI: -52
...
[bluetooth]# pair A8:03:2A:EC:0E:D2 
Attempting to pair with A8:03:2A:EC:0E:D2
[CHG] Device A8:03:2A:EC:0E:D2 Connected: yes
Request confirmation
[agent] Confirm passkey 639239 (yes/no): yes
[CHG] Device A8:03:2A:EC:0E:D2 UUIDs: 00001101-0000-1000-8000-00805f9b34fb
[CHG] Device A8:03:2A:EC:0E:D2 ServicesResolved: yes
[CHG] Device A8:03:2A:EC:0E:D2 Paired: yes
Pairing successful
[CHG] Device A8:03:2A:EC:0E:D2 ServicesResolved: no
[CHG] Device A8:03:2A:EC:0E:D2 Connected: no
[CHG] Device A8:03:2A:EC:0E:D2 Connected: yes
[CHG] Device A8:03:2A:EC:0E:D2 Connected: no
...
[bluetooth]# info A8:03:2A:EC:0E:D2 
Device A8:03:2A:EC:0E:D2 (public)
    Name: ESP_SPP_ACCEPTOR
    Alias: ESP_SPP_ACCEPTOR
    Class: 0x00240414
    Icon: audio-card
    Paired: yes
    Trusted: no
    Blocked: no
    Connected: no
    LegacyPairing: no
    UUID: Serial Port               (00001101-0000-1000-8000-00805f9b34fb)
    RSSI: -56
...
[CHG] Device A8:03:2A:EC:0E:D2 Class: 0x00241f00
[CHG] Device A8:03:2A:EC:0E:D2 Icon is nil
...
[CHG] Device A8:03:2A:EC:0E:D2 Class: 0x00240414
[CHG] Device A8:03:2A:EC:0E:D2 Icon: audio-card
...
[bluetooth]# info A8:03:2A:EC:0E:D2 
Device A8:03:2A:EC:0E:D2 (public)
    Name: ESP_SPP_ACCEPTOR
    Alias: ESP_SPP_ACCEPTOR
    Class: 0x00240414
    Icon: audio-card
    Paired: yes
    Trusted: no
    Blocked: no
    Connected: no
    LegacyPairing: no
    UUID: Serial Port               (00001101-0000-1000-8000-00805f9b34fb)
    RSSI: -57
...
[CHG] Device A8:03:2A:EC:0E:D2 Class: 0x00241f00
[CHG] Device A8:03:2A:EC:0E:D2 Icon is nil
...
[CHG] Device A8:03:2A:EC:0E:D2 Class: 0x00240414
[CHG] Device A8:03:2A:EC:0E:D2 Icon: audio-card
...
[CHG] Device A8:03:2A:EC:0E:D2 Connected: yes
[CHG] Device A8:03:2A:EC:0E:D2 Class: 0x00241f00
[CHG] Device A8:03:2A:EC:0E:D2 Icon is nil
...
[CHG] Device A8:03:2A:EC:0E:D2 Connected: no
[CHG] Device A8:03:2A:EC:0E:D2 Class: 0x00240414
[CHG] Device A8:03:2A:EC:0E:D2 Icon: audio-card
[CHG] Device A8:03:2A:EC:0E:D2 Connected: yes
[CHG] Device A8:03:2A:EC:0E:D2 Class: 0x00241f00
[CHG] Device A8:03:2A:EC:0E:D2 Icon is nil
[CHG] Device A8:03:2A:EC:0E:D2 Connected: no

I have stripped RSSI and other devices that were visible for brevity of this log. A ... denotes some amount of time between the log entries.

Note specifically:

...
[CHG] Device A8:03:2A:EC:0E:D2 Class: 0x00241f00
[CHG] Device A8:03:2A:EC:0E:D2 Icon is nil
...
[CHG] Device A8:03:2A:EC:0E:D2 Class: 0x00240414
[CHG] Device A8:03:2A:EC:0E:D2 Icon: audio-card
...

This is the problem that was happening with our application before my patch. It needs to stay 240414 and never change to 241f00, as this causes the customer's application to display the incorrect icon for our device and to categorize it incorrectly.

Here is a program written for libbluetooth on Linux that can connect to the SPP and help demonstrate, as sometimes it is after the connection opens or closes that the class of device flips back and forth.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/socket.h>
#include <bluetooth/bluetooth.h>
#include <bluetooth/rfcomm.h>

int s;

int main(int argc, char **argv) {
    struct sockaddr_rc addr = { 0 };
    int status;

    // Create socket
    s = socket(AF_BLUETOOTH, SOCK_STREAM, BTPROTO_RFCOMM);
    if (s < 0) {
        perror("Socket creation failed");
        exit(1);
    }

    // Set the connection parameters
    addr.rc_family = AF_BLUETOOTH;
    addr.rc_channel = (uint8_t)1;
    str2ba("A8:03:2A:EC:0E:D2", &addr.rc_bdaddr);

    // Connect to the remote device
    status = connect(s, (struct sockaddr *)&addr, sizeof(addr));
    if (status < 0) {
        perror("Connection failed");
        close(s);
        exit(1);
    }
    int ch = 0;
    printf("Connected to device\n");

    while(ch != 1){
        printf("\n\n Menu: \n");
        printf("1. Exit\n\n");

        printf("Enter the choice: ");

        scanf("%d", &ch);

        switch(ch)
        {
            case 1: break;
        }
        // Send data
    }
    // Close socket
    close(s);

    return 0;
}

My applied fix (and perhaps the ultimately suggested one) is to simply remove setting class to **1f00 in bta_jv_enable and probably also remove setting class in bta_av_api_register, so that the esp_bt_gap_set_cod call in the application is the single source of truth for the entire session of the application, nothing else can ever possibly change the class of device. A more advanced fix would be to keep track of if that was ever called and only override the class in those two functions if it was not explicitly called. But I didn't want to go through that much effort when I could simply comment it out and move on. I just don't know what implication that has on if the class of device is never set in the first place.

Additional complication: my application uses ESP-ADF bluetooth_service which internally sets discoverable mode when it starts. I can modify it, too, since I already have a fork of that repo as well, but it adds extra overhead to maintain my own special fork for something which updates so often as ESP-ADF & ESP-IDF. Ideally it should be fixed here so that nobody ever runs into this.

The less ideal workaround would be to initialize SPP first and delay initializing anything else until after SPP finishes, then set class of device, then start the bluetooth_service. That would cause my application to take longer before it is ready to be discovered/connected, which is already nearly uncomfortably long (I have internal users already saying it takes too long as it is, but, it's the best I can do with Bluetooth being Bluetooth).