espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.79k stars 7.31k forks source link

BLE suddenly disconnected, Wi-Fi and BLE coexist case. (IDFGH-13900) #14742

Open Sky-Soo-Ap opened 1 month ago

Sky-Soo-Ap commented 1 month ago

Answers checklist.

IDF version.

v5.3.1

Espressif SoC revision.

ESP32-D0WDR2-V3-V3 (revision v3.1)

Operating System used.

Windows

How did you build your project?

VS Code IDE

If you are using Windows, please specify command line type.

None

Development Kit.

Custom Board

Power Supply used.

USB

What is the expected behavior?

I expected BLE to keep connected.

What is the actual behavior?

BLE will be randomly disconnected.

Steps to reproduce.

  1. Wi-Fi & BLE coexist;
  2. ESP32 keep receiving BLE notificaiton data;
  3. Using ESP32 Wi-Fi publish notification data to MQTT; ...

Debug Logs.

I (12:21:53.206) azp_mqtt: iometer/agent/devices/notification/bb:a0:50:78:97:d5, drop packet count:0
I (2598588) BT_GATT: GATT_GetConnectionInfor conn_id=1
I (2598588) BT_GATT: GATT_GetConnectionInfor conn_id=2
I (2598588) BT_GATT: GATT_GetConnectionInfor conn_id=3
I (2598598) BT_GATT: GATT_GetConnectionInfor conn_id=4
I (2598598) BT_GATT: GATT_GetConnectionInfor conn_id=5
I (2598608) BT_GATT: GATT_GetConnectionInfor conn_id=6
I (2598618) BT_GATT: GATT_GetConnectionInfor conn_id=7
I (2598618) BT_GATT: GATT_GetConnectionInfor conn_id=8
I (12:21:53.431) azp_mqtt: iometer/agent/devices/notification/bb:a0:50:78:97:d5, drop packet count:0
W (2604658) BT_APPL: gattc_conn_cb: if=1 st=0 id=1 rsn=0x8
W (2604658) BT_APPL: gattc_conn_cb: if=2 st=0 id=2 rsn=0x8
W (2604658) BT_APPL: gattc_conn_cb: if=3 st=0 id=3 rsn=0x8
W (2604668) BT_APPL: gattc_conn_cb: if=4 st=0 id=4 rsn=0x8
W (2604668) BT_APPL: gattc_conn_cb: if=5 st=0 id=5 rsn=0x8
W (2604678) BT_APPL: gattc_conn_cb: if=6 st=0 id=6 rsn=0x8
W (2604678) BT_APPL: gattc_conn_cb: if=7 st=0 id=7 rsn=0x8
W (2604688) BT_APPL: gattc_conn_cb: if=8 st=0 id=8 rsn=0x8
W (2604698) BT_HCI: hcif disc complete: hdl 0x0, rsn 0x8
I (2604698) BT_L2CAP: L2CA_SetDesireRole() new:x1, disallow_switch:0
E (12:21:59.509) azp_gatt_client: device disconnected, MAC:bba0507897d5
I (12:21:59.515) azp_mqtt: publish topic:iometer/agent/devices/disconnected, payload:{
    "macAddress":   "bb:a0:50:78:97:d5"
}, msg_id=50742
I (12:21:59.894) azp_mqtt: MQTT_EVENT_PUBLISHED, msg_id=50742
I (12:38:48.785) azp_sntp: Notification of a time synchronization event

More Information.

Attached BLE & log level set to verbose. 2024-10-17_debug.log

wmy-espressif commented 4 weeks ago

Hi @Sky-Soo-Ap , thanks for reporting the issue and the logs. I would like to collect following information for further analysis.

  1. Can you provide the sdkconfig file of the project?
  2. How frequently does this issue reproduce, and how long it does it take?
  3. I would like to confirm on the activities of Wi-Fi and BLE when the issue occurs:
    • Wi-Fi STA mode is in use and Soft AP mode is not enabled, right?
    • For BLE, there is only one connection, without other activities include scanning or advertising, right?
  4. Does this issue happens in the same scenario : https://github.com/espressif/esp-idf/issues/14743
  5. How severe is the air wireless interference? What is the distance from the remote BLE device?
Sky-Soo-Ap commented 4 weeks ago

Hi Wmy, 1.Can you provide the sdkconfig file of the project? The attachment is the sdkconfig file. sdkconfig.txt 2.How frequently does this issue reproduce, and how long it does it take? A few hours or 1 day or 2 days. 3.I would like to confirm on the activities of Wi-Fi and BLE when the issue occurs:

wmy-espressif commented 4 weeks ago

Hi @Sky-Soo-Ap thanks for the information.

For Question 5 on estimation of wireless interference, it is enough to roughly tell me how many wireless devices(BLE devices, WLAN APs, etc) that uses 2.4GHz band are there in your test environment, and what are the distances from the device under test nearby.

Here is my initial analysis for this issue:

LOG1:

I (11:39:09.557) azp_gatt_client: ESP_GATTC_CONNECT_EVT conn_id = 0, link_role = 0, remote_bda = bb a0 50 78 97 d5, interval = 12, latency = 0, timeout = 600

LOG2:

W (2604698) BT_HCI: hcif disc complete: hdl 0x0, rsn 0x8

In case of wireless coexistence of BLE connection + WiFi STA mode, part of BLE connection events(15ms interval) will be pre-empted by Wi-Fi activities, however, the SupervisionTimeout is as long as 6s, which allows at most 400 chances for two devices to communicate and restore synchronization. In this case, occasional interference(not severe) is not likely to break the connection, I think.

However, we don't know yet whether the BLE supervision timeout is caused by the DUT(central) or the remote BLE device. So I would suggest to continue to investigate the following factors:

  1. What is the remote BLE device? Is it also an ESP32? Does it enable other wireless activities other than BLE?
  2. Set BLE sleep clock accurcy to 500ppm, instead of 250ppm:
    CONFIG_BTDM_BLE_DEFAULT_SCA_250PPM=n
    CONFIG_BTDM_BLE_DEFAULT_SCA_500PPM=y

    This may make peripheral to use larger Rx window and improve the robustness of BLE link.

Besides, suggest to modify the following configuration options, which may improve the wireless coexistence performance:

  1. CONFIG_FREERTOS_HZ=1000: to improve context switch frequency to reduce possible lagging of scheduling BLE lower level task.
  2. Change Bluedroid related log levels from VERBOSE to WARNING, to reduce CPU loading
Sky-Soo-Ap commented 4 weeks ago

Hi Wmy,

Thank you for your detailed reply. I will modify what you mentioned as follows and try to get the log: CONFIG_BTDM_BLE_DEFAULT_SCA_250PPM=n CONFIG_BTDM_BLE_DEFAULT_SCA_500PPM=y CONFIG_FREERTOS_HZ=1000 Change Bluedroid-related log levels from VERBOSE to WARNING.

  1. What is the remote BLE device? Is it also an ESP32? Does it enable other wireless activities other than BLE? Not ESP32 and no support for wireless activities other than BLE.

However, We don't know yet whether the BLE supervision timeout is caused by the DUT (central) or the remote BLE device. Is there a way to add the log to determine whether a packet has been sent, received, or missed in each connection interval?

Sky-Soo-Ap commented 3 weeks ago

Hi Wmy,

About the estimation of wireless interference, there are another 2 devices that use the 2.4 GHz band at about 5 meters. I’m attaching the log and sdkconfig files after changing those suggestions except “CONFIG_BTDM_BLE_DEFAULT_SCA_250PPM=y” due to only this option. 2024-10-21.log sdkconfig.txt

BTW, do you know how the ESP32 client (master) role in BLE listens and responds to peripheral (slave) update connection interval parameters requests?

wmy-espressif commented 3 weeks ago

HI @Sky-Soo-Ap

Is there a way to add the log to determine whether a packet has been sent, received, or missed in each connection interval?

I am afraid we don't have such lower-level statistics. If you have a sniffer you can check the air packets.

BTW, do you know how the ESP32 client (master) role in BLE listens and responds to peripheral (slave) update connection interval parameters requests?

Usually ESP32 as BLE central will accept the connection update request from peripheral.

I have check the log file "2024-10-21.log", there is a new phenomenon that did not occur in "2024-10-17_debug.log":

I (17:48:38.102) azp_gatt_client: update connection params status = 0, min_int = 12, max_int = 40, conn_int = 40, latency = 10, timeout = 600

This indicates connection parameter is updated, connection interval is increased to 50ms(40 * 1.25ms), and slave latency is set to 10. This parameter set will behave quite differently from the original log that uses 15ms connection interval and 0 slave latency. Current parameter is more risky in case of BLE connection robustness in wireless coexistence mode.

My questions:

  1. In your previous tests when BLE disconnects due to reason 0x8, do similar connection parameter update procedure happen?
  2. In your application, can you determine the BLE connection parameter and choose another parameter set or reject the slave request?
  3. What is your expected performance on BLE connection? If BLE disconnection is unavoidable and some reconnection mechanism must be performed, how much influence does it have on your product/solution?
Sky-Soo-Ap commented 3 weeks ago

Hi Wmy,

  1. In your previous testing, does a similar connection parameter update process occur when BLE is disconnected with reason 0x8? In the previous test, no connection parameters were updated; however, there will be preset values for the peripherals in the future, which is under debugging.

  2. In your application, can you determine the BLE connection parameters and select another parameter set or deny the request from the device? In my application, BLE connection parameter requests from the device or MQTT subscription are not rejected.

  3. What performance can you expect from your BLE connection? If BLE disconnection is unavoidable and some reconnection mechanism must be implemented, how much impact does this have on your product/solution? The impact is huge. Because it is Central (master), it is connected to more than one peripheral device. If one device BLE is disconnected and encounters an exception when reconnecting, it will be stuck for 30 seconds until it times out, and other devices will not be able to update data in real time. If the connection is blocked three times after reconnection, the data cannot be updated in real time for one and a half minutes.

Also, how do I set the BLE connection timeout?

wmy-espressif commented 3 weeks ago

Hi @Sky-Soo-Ap ,

how do I set the BLE connection timeout?

The default BLE supervision timeout is 6s when acting as central. There is an inner macro BTM_BLE_CONN_TIMEOUT_DEF in Bluedroid. As central device, ESP32 can initiate connection parameter update using API esp_ble_gap_update_conn_params

Sky-Soo-Ap commented 3 weeks ago

Hi Wmy, I mean connect BlE device with the API esp_ble_gattc_open, how to set time out for this API?

wmy-espressif commented 3 weeks ago

I am afraid there is no argument to set connection timeout provided in the API esp_ble_gattc_open. The default connection timeout value is set inside Bluedroid Host i.e. using macro BTM_BLE_CONN_TIMEOUT_DEF.

An option is to use esp_ble_gap_update_conn_params to update the parameter after BLE connection is established.

Sky-Soo-Ap commented 3 weeks ago

Hi Wmy, The macro BTM_BLE_CONN_TIMEOUT_DEF is the default supervision timeout, and the value is 600 (6 seconds), not for API esp_ble_gattc_open. Could you point out where the macro is defined? And, can I change the macro value? Thanks.

wmy-espressif commented 3 weeks ago

Hi @Sky-Soo-Ap the macro BTM_BLE_CONN_TIMEOUT_DEF is defined in esp-idf/components/bt/host/bluedroid/stack/include/stack/btm_ble_api.h. You can change it in your test and debugging. If you need it as an argument in our API, I can tell the colleague to evaluate this feature request.

Sky-Soo-Ap commented 3 weeks ago

Hi Wmy, You misunderstood my meaning. I’m looking for the API esp_ble_gattc_open timeout definition. This API seems to timeout after 30 seconds.

wmy-espressif commented 3 weeks ago

Hi @Sky-Soo-Ap . I guess this is what you need: there is a sdkconfig parameter: CONFIG_BT_BLE_ESTAB_LINK_CONN_TOUT, you can modify it and rebuild the project.

Sky-Soo-Ap commented 3 weeks ago

Hi wmy-espressif, Yes, that's what I want, thanks. It takes up to 30 seconds to establish a Bluetooth connection. Are there any considerations?