espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
12.93k stars 7.09k forks source link

ble_gattc_disc_all_svcs() fails on ESP32S3 with several peripherals (IDFGH-8664) #10109

Closed kojibuta closed 1 year ago

kojibuta commented 1 year ago

Answers checklist.

IDF version.

ESP-IDF v4.4.2-388-g755ce1077d

Operating System used.

macOS

How did you build your project?

Command line with idf.py

If you are using Windows, please specify command line type.

No response

Development Kit.

Custom board

Power Supply used.

USB

What is the expected behavior?

When using NimBLE (BLE central role) on ESP32-S3 it is not possible to discover services on some peripheral devices. The same code runs smoothly on ESP32 without errors.

After connection is established with a peripheral (event BLE_GAP_EVENT_CONNECT received with event->connect.status == 0) calling ble_gattc_disc_all_svcs should result in invokation of callback function ble_gatt_disc_svc_fn with error->status == 0.

What is the actual behavior?

When running the code on ESP32-S3, after connection is established with a peripheral, calling ble_gattc_disc_all_svcs results in immediate invokation of callback function ble_gatt_disc_svc_fn with error->status == 7.

Steps to reproduce.

  1. Flash central test_app to a ESP32-S3 board
  2. Power on the board
  3. Start monitoring the log
  4. Power on the peripheral
  5. Error appears on the log and service discovery fails
  6. Flash central test_app to a ESP32 board
  7. Power on the board
  8. Start monitoring the log
  9. Power on the peripheral
  10. Service discovery is successful and no error appears on the log

ESP32-S3 logs (SleepO2 device OK, B02T device KO) ESP32-S3 board + SleepO2 device: esp32-s3-peripheral-1.log ESP32 board + B02T device: esp32-s3-peripheral-2.log

ESP32 logs (SleepO2 and B02T device OK) ESP32 board + SleepO2 device: esp32-peripheral-1.log ESP32 board + B02T device: esp32-peripheral-2.log

Debug Logs.

--- idf_monitor on /dev/cu.usbserial-120 115200 ---
--- Quit: Ctrl+] | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H ---
ESP-ROM:esp32s3-20210327
Build:Mar 27 2021
rst:0x1 (POWERON),boot:0x8 (SPI_FAST_FLASH_BOOT)
SPIWP:0xee
mode:DIO, clock div:1
load:0x3fce3808,len:0x1670
load:0x403c9700,len:0xbb8
load:0x403cc700,len:0x2e8c
entry 0x403c9954
I (24) boot: ESP-IDF v4.4.2-388-g755ce1077d 2nd stage bootloader
I (25) boot: compile time 18:46:00
I (25) boot: chip revision: 0
I (28) boot.esp32s3: Boot SPI Speed : 80MHz
I (32) boot.esp32s3: SPI Mode       : DIO
I (37) boot.esp32s3: SPI Flash Size : 2MB
I (42) boot: Enabling RNG early entropy source...
I (47) boot: Partition Table:
I (51) boot: ## Label            Usage          Type ST Offset   Length
I (58) boot:  0 nvs              WiFi data        01 02 00009000 00006000
I (66) boot:  1 phy_init         RF data          01 01 0000f000 00001000
I (73) boot:  2 factory          factory app      00 00 00010000 00100000
I (81) boot: End of partition table
I (85) esp_image: segment 0: paddr=00010020 vaddr=3c050020 size=0e780h ( 59264) map
I (104) esp_image: segment 1: paddr=0001e7a8 vaddr=3fc95b70 size=01870h (  6256) load
I (105) esp_image: segment 2: paddr=00020020 vaddr=42000020 size=479c0h (293312) map
I (163) esp_image: segment 3: paddr=000679e8 vaddr=3fc973e0 size=0213ch (  8508) load
I (165) esp_image: segment 4: paddr=00069b2c vaddr=40374000 size=11b64h ( 72548) load
I (184) esp_image: segment 5: paddr=0007b698 vaddr=50000000 size=00010h (    16) load
I (192) boot: Loaded app from partition at offset 0x10000
I (192) boot: Disabling RNG early entropy source...
I (204) cpu_start: Pro cpu up.
I (205) cpu_start: Starting app cpu, entry point is 0x403751d8
0x403751d8: call_start_cpu1 at /Users/roberto/esp/esp-idf/components/esp_system/port/cpu_start.c:148

I (0) cpu_start: App cpu up.
I (219) cpu_start: Pro cpu start user code
I (219) cpu_start: cpu freq: 160000000
I (219) cpu_start: Application information:
I (221) cpu_start: Project name:     test_app
I (226) cpu_start: App version:      driver-v1.1.6-b13-3-gbf6b4e6-di
I (233) cpu_start: Compile time:     Nov  4 2022 18:45:55
I (239) cpu_start: ELF file SHA256:  bc8b6e398803187d...
I (245) cpu_start: ESP-IDF:          v4.4.2-388-g755ce1077d
I (252) heap_init: Initializing. RAM available for dynamic allocation:
I (259) heap_init: At 3FC9B798 len 0004DF78 (311 KiB): D/IRAM
I (265) heap_init: At 3FCE9710 len 00005724 (21 KiB): STACK/DRAM
I (272) heap_init: At 3FCF0000 len 00008000 (32 KiB): DRAM
I (278) heap_init: At 600FE000 len 00002000 (8 KiB): RTCRAM
I (285) spi_flash: detected chip: gd
I (289) spi_flash: flash io: dio
W (293) spi_flash: Detected size(16384k) larger than the size in the binary image header(2048k). Using the size in the binary image header.
I (306) sleep: Configure to isolate all GPIO pins in sleep state
I (313) sleep: Enable automatic switching of GPIO sleep configuration
I (320) coexist: coexist rom version e7ae62f
I (325) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.
W (407) BT_INIT: esp_bt_controller_mem_release not implemented, return OK
W (407) BT_INIT: esp_bt_controller_mem_release not implemented, return OK
I (407) BT_INIT: BT controller compile version [421c279]
I (417) phy_init: phy_version 503,13653eb,Jun  1 2022,17:47:08
I (457) system_api: Base MAC address is not set
I (457) system_api: read default base MAC address from EFUSE
I (457) BT_INIT: Bluetooth MAC: f4:12:fa:cb:81:b6

I (467) TEST_APP: test_app_host_task
I (477) NimBLE: GAP procedure initiated: stop advertising.

I (477) TEST_APP: test_app_on_sync
I (477) TEST_APP: test_app_scan: scanning...
I (487) NimBLE: GAP procedure initiated: discovery; 
I (487) NimBLE: own_addr_type=0 filter_policy=0 passive=0 limited=0 filter_duplicates=1 
I (497) NimBLE: duration=forever
I (507) NimBLE: 

I (547) TEST_APP: test_app_gap_event: 7 BLE_GAP_EVENT_DISC
I (547) TEST_APP: test_app_gap_event: 7 BLE_GAP_EVENT_DISC
I (557) TEST_APP: test_app_gap_event: 7 BLE_GAP_EVENT_DISC
I (567) TEST_APP: test_app_gap_event: 7 BLE_GAP_EVENT_DISC
I (577) TEST_APP: test_app_gap_event: 7 BLE_GAP_EVENT_DISC
I (577) TEST_APP: test_app_gap_event: 7 BLE_GAP_EVENT_DISC
I (587) TEST_APP: test_app_gap_event: 7 BLE_GAP_EVENT_DISC
I (807) TEST_APP: test_app_gap_event: 7 BLE_GAP_EVENT_DISC
I (807) TEST_APP: test_app_gap_event: name miaomiao
I (877) TEST_APP: test_app_gap_event: 7 BLE_GAP_EVENT_DISC
I (877) TEST_APP: test_app_gap_event: 7 BLE_GAP_EVENT_DISC
I (877) TEST_APP: test_app_gap_event: name WGX_iBeacon
I (897) TEST_APP: test_app_gap_event: 7 BLE_GAP_EVENT_DISC
I (897) TEST_APP: test_app_gap_event: 7 BLE_GAP_EVENT_DISC
I (897) TEST_APP: test_app_gap_event: name B6
I (1127) TEST_APP: test_app_gap_event: 7 BLE_GAP_EVENT_DISC
I (1127) TEST_APP: test_app_gap_event: 7 BLE_GAP_EVENT_DISC
I (1127) TEST_APP: test_app_gap_event: 7 BLE_GAP_EVENT_DISC
I (1127) TEST_APP: test_app_gap_event: name Holy-IOT
I (1137) TEST_APP: test_app_gap_event: 7 BLE_GAP_EVENT_DISC
I (1137) TEST_APP: test_app_gap_event: 7 BLE_GAP_EVENT_DISC
I (1147) TEST_APP: test_app_gap_event: name Jinou_Beacon
I (1937) TEST_APP: test_app_gap_event: 7 BLE_GAP_EVENT_DISC
I (1937) TEST_APP: test_app_gap_event: 7 BLE_GAP_EVENT_DISC
I (1937) TEST_APP: test_app_gap_event: name ThermoBeacon
I (2807) TEST_APP: test_app_gap_event: 7 BLE_GAP_EVENT_DISC
I (2807) TEST_APP: test_app_gap_event: name DuoEK 1429
I (2807) TEST_APP: test_app_gap_event: 7 BLE_GAP_EVENT_DISC
I (5137) TEST_APP: test_app_gap_event: 7 BLE_GAP_EVENT_DISC
I (5147) TEST_APP: test_app_gap_event: 7 BLE_GAP_EVENT_DISC
I (5147) TEST_APP: test_app_gap_event: name BPM-188
I (5147) NimBLE: GAP procedure initiated: connect; 
I (5157) NimBLE: peer_addr_type=0 peer_addr=
I (5157) NimBLE: a4:c1:38:84:01:cc
I (5167) NimBLE:  scan_itvl=16 scan_window=16 itvl_min=24 itvl_max=40 latency=0 supervision_timeout=256 min_ce_len=0 max_ce_len=0 own_addr_type=0
I (5177) NimBLE: 

I (5177) TEST_APP: test_app_connect: connecting...
I (5237) TEST_APP: test_app_gap_event: 0 BLE_GAP_EVENT_CONNECT
I (5237) TEST_APP: test_app_gap_event: status=0
I (5237) TEST_APP: print_conn_desc: handle=1 our_ota_addr_type=0 our_ota_addr=f4:12:fa:cb:81:b6
I (5247) TEST_APP: print_conn_desc: our_id_addr_type=0 our_id_addr=f4:12:fa:cb:81:b6
I (5257) TEST_APP: print_conn_desc: peer_ota_addr_type=0 peer_ota_addr=a4:c1:38:84:01:cc
I (5267) TEST_APP: print_conn_desc: peer_id_addr_type=0 peer_id_addr=a4:c1:38:84:01:cc
I (5267) TEST_APP: print_conn_desc: conn_itvl=40 conn_latency=0 supervision_timeout=256 encrypted=0 authenticated=0 bonded=0
I (5287) NimBLE: GATT procedure initiated: discover all services

E (5567) TEST_APP: test_app_svc_disced: conn_handle=1 uuid= error_status=7
I (5567) NimBLE: GAP procedure initiated: connect; 
I (5567) NimBLE: peer_addr_type=0 peer_addr=
I (5577) NimBLE: a4:c1:38:84:01:cc
I (5577) NimBLE:  scan_itvl=16 scan_window=16 itvl_min=24 itvl_max=40 latency=0 supervision_timeout=256 min_ce_len=0 max_ce_len=0 own_addr_type=0
I (5587) NimBLE: 

I (5627) TEST_APP: test_app_gap_event: 0 BLE_GAP_EVENT_CONNECT
I (5627) TEST_APP: test_app_gap_event: status=0
I (5627) TEST_APP: print_conn_desc: handle=1 our_ota_addr_type=0 our_ota_addr=f4:12:fa:cb:81:b6
I (5637) TEST_APP: print_conn_desc: our_id_addr_type=0 our_id_addr=f4:12:fa:cb:81:b6
I (5647) TEST_APP: print_conn_desc: peer_ota_addr_type=0 peer_ota_addr=a4:c1:38:84:01:cc
I (5657) TEST_APP: print_conn_desc: peer_id_addr_type=0 peer_id_addr=a4:c1:38:84:01:cc
I (5657) TEST_APP: print_conn_desc: conn_itvl=40 conn_latency=0 supervision_timeout=256 encrypted=0 authenticated=0 bonded=0
I (5677) NimBLE: GATT procedure initiated: discover all services

E (5967) TEST_APP: test_app_svc_disced: conn_handle=1 uuid= error_status=7
I (5967) NimBLE: GAP procedure initiated: connect; 
I (5967) NimBLE: peer_addr_type=0 peer_addr=
I (5977) NimBLE: a4:c1:38:84:01:cc
I (5977) NimBLE:  scan_itvl=16 scan_window=16 itvl_min=24 itvl_max=40 latency=0 supervision_timeout=256 min_ce_len=0 max_ce_len=0 own_addr_type=0
I (5987) NimBLE: 

I (6017) TEST_APP: test_app_gap_event: 0 BLE_GAP_EVENT_CONNECT
I (6017) TEST_APP: test_app_gap_event: status=0
I (6017) TEST_APP: print_conn_desc: handle=1 our_ota_addr_type=0 our_ota_addr=f4:12:fa:cb:81:b6
I (6027) TEST_APP: print_conn_desc: our_id_addr_type=0 our_id_addr=f4:12:fa:cb:81:b6
I (6037) TEST_APP: print_conn_desc: peer_ota_addr_type=0 peer_ota_addr=a4:c1:38:84:01:cc
I (6047) TEST_APP: print_conn_desc: peer_id_addr_type=0 peer_id_addr=a4:c1:38:84:01:cc
I (6047) TEST_APP: print_conn_desc: conn_itvl=40 conn_latency=0 supervision_timeout=256 encrypted=0 authenticated=0 bonded=0
I (6067) NimBLE: GATT procedure initiated: discover all services

E (6367) TEST_APP: test_app_svc_disced: conn_handle=1 uuid= error_status=7
I (6367) NimBLE: GAP procedure initiated: connect; 
I (6367) NimBLE: peer_addr_type=0 peer_addr=
I (6377) NimBLE: a4:c1:38:84:01:cc
I (6377) NimBLE:  scan_itvl=16 scan_window=16 itvl_min=24 itvl_max=40 latency=0 supervision_timeout=256 min_ce_len=0 max_ce_len=0 own_addr_type=0
I (6387) NimBLE: 

I (6407) TEST_APP: test_app_gap_event: 0 BLE_GAP_EVENT_CONNECT
I (6407) TEST_APP: test_app_gap_event: status=0
I (6407) TEST_APP: print_conn_desc: handle=1 our_ota_addr_type=0 our_ota_addr=f4:12:fa:cb:81:b6
I (6417) TEST_APP: print_conn_desc: our_id_addr_type=0 our_id_addr=f4:12:fa:cb:81:b6
I (6427) TEST_APP: print_conn_desc: peer_ota_addr_type=0 peer_ota_addr=a4:c1:38:84:01:cc
I (6427) TEST_APP: print_conn_desc: peer_id_addr_type=0 peer_id_addr=a4:c1:38:84:01:cc
I (6437) TEST_APP: print_conn_desc: conn_itvl=40 conn_latency=0 supervision_timeout=256 encrypted=0 authenticated=0 bonded=0
I (6447) NimBLE: GATT procedure initiated: discover all services

E (6717) TEST_APP: test_app_svc_disced: conn_handle=1 uuid= error_status=7
I (6717) TEST_APP: test_app_gap_event: 1 BLE_GAP_EVENT_DISCONNECT

More Information.

Error always occur using some peripheral devices, but not with every device. Error occurs using ESP-IDF release/v4.4, release/v5.0 and latest versions.

Some peripheral causing the error are:

Some peripheral working without errors are:

kojibuta commented 1 year ago

Here is the complete project directory: test_app.zip

kojibuta commented 1 year ago

I created an identical app using bluedroid and the bahavior is exactly the same, the only difference is the error code that is 0x3B using Bluedroid while it is 7 using NimBLE.

SumeetSingh19 commented 1 year ago

Hi @kojibuta, Could you please share the "DEBUG" logs for the success and failed cases. To enable debug logs, do:

  1. idf,py menuconfig > Component config > Bluetooth > NimBLE options > NimBLE Host log verbosity > Set it to "Debug logs"
  2. idf.py menuconfig > Log Output > Default log verbosity > Set it to "Debug"

Save and quit the menuconfig, build the central, flash, and monitor and you should see log output starting with D e.g. "D (744) NimBLE: registered service 0x1800 with handle=1" This means that debug logs are enabled. You may need to re-enable them when building the application again.

SumeetSingh19 commented 1 year ago

Also, try connecting the peripherals you mentioned with third party BLE testing tools like nRF Connect (as central) and go for service discovery. Please share the logs for the same.

kojibuta commented 1 year ago

Hi @SumeetSingh19,

here are the debug logs running the same source code with two different devices (Blood Pressure Monitor B02T and Thermomether AOJ-20A). Log files 1 and 2 running on ESP32-S3-WROOM-1 (connection failure), log files 3 and 4 running on ESP32-WROOM-32D (successful connect/discovery).

  1. Log of ESP32-S3 using Wellue/Viatom Blood Pressure Monitor B02T, advertised name "BPM-188": esp32-s3-peripheral-debug-3.log

  2. Log of ESP32-S3 using Viatom Thermomether AOJ-20A, advertised name "AOJ-20A": esp32-s3-peripheral-debug-4.log

  3. Log of ESP32 using Wellue/Viatom Blood Pressure Monitor B02T, advertised name "BPM-188": esp32-peripheral-debug-3.log

  4. Log of ESP32 using Viatom Thermomether AOJ-20A, advertised name "AOJ-20A": esp32-peripheral-debug-4.log

Here are the logs using nRF-Connect on iOS (not really verbose BTW):

Please note that I had to slightly modify the code to start connection with device "AOJ-20A" because the function ble_hs_adv_parse_fields() is not parsing correctly the advertisement. The function return BLE_HS_EBADDATA because the advertisement is padded with 0x00 bytes up to the length of 25 bytes (i.e. 3 extra bytes):

I (3354) TEST_APP: test_app_gap_event: event_type BLE_HCI_ADV_RPT_EVTYPE_SCAN_RSP
I (3364) TEST_APP: 0x3fc9e9c5   08 09 41 4f 4a 2d 32 30  41 02 0a 00 09 ff 00 00  |..AOJ-20A.......|
I (3374) TEST_APP: 0x3fc9e9d5   00 00 00 00 00 00 00 00  00                       |.........|

I think the function should be more "forgiving" with zero-padded advertisements in order to support a wider range of physical devices. You can always think of padding zeros as records having a length of 0. The above advertisement you have 6 records (instead of 3) with lengths 8, 2, 9, 0, 0, 0.

Here is the modified main.c file: test_app.zip

kojibuta commented 1 year ago

Hi @SumeetSingh19,

Apparently the controller is sending a different HCI command to the host and this triggers the error.

image

I tried running the same app on Unexpected Maker TinyS3 development board with ESP32-S3FN8 MCU and I get the same results.

I would like to know if you plan to address this problem in the near future, otherwise I must go back to ESP32-WROOM-32E module for my device. I would like to use the new ESP32-S3-WROOM-1 because it has better overall performances but I need to support several BLE devices that do not work properly with ESP32-S3.

SumeetSingh19 commented 1 year ago

Hi @kojibuta, the controller doesn't send different HCI commands, actually, the first byte of the command is different because the connection handles in the two chips start at 0 on esp32 and 1 on esp32s3.

SumeetSingh19 commented 1 year ago

HI @kojibuta

i checked your logs and compared it with success logs . The failure scenario in your case is that even though connection packet has been sent, the actual connection is not happening. As seen in logs, we see a disconnect with reason code 0x3E. This basically points that the actual over the air connection with remote is failed.

This behaviour also points that this may be specific in the communication of S3 and the particular remote device. Am sure, S3 would be working with other devices , even in your tests ?

I wanted to check, if it is feasible for you to get a OTA log to check the packets ? Since from host side, we have already posted the command with valid parameters.

kojibuta commented 1 year ago

Hi @SumeetSingh19

I run into this issue while porting an existing firmware from ESP32 to ESP32-S3. The firmware has been working very well on a custom BLE/MQTT gateway I designed for several years. I am in the process of re-designing the custom gateway and I would like to upgrade to ESP32-S3. Unfortunately some of the BLE peripherals my customers are actively using do not work with ESP32-S3.

This behaviour also points that this may be specific in the communication of S3 and the particular remote device. Am sure, S3 would be working with other devices , even in your tests ?

The ESP32-S3 board and my firmware both work well with many devices by the way. But not with all of them. A subset of the devices I must support is not working.

Since all devices work well using the same firmware on ESP32 (instead of ESP32-S3) I thought it could be a BLE stack implementation bug specific to ESP32-S3.

I wanted to check, if it is feasible for you to get a OTA log to check the packets ? Since from host side, we have already posted the command with valid parameters.

I really would like to send you all the logs you need, but I don't know how/where to get a OTA log. Is it something I can enable on ESP32? Do I need some external BLE packet sniffer? If yes, can you point me to the correct one?

kojibuta commented 1 year ago

Hi @SumeetSingh19,

Since from host side, we have already posted the command with valid parameters.

Do you mean it can be a RF related problem? Could it be something related to the fact that ESP32 also supports legacy bluetooth but ESP32-S3 only supports BLE? Are the two CPU using a different BLE radio stack?

SumeetSingh19 commented 1 year ago

Do you mean it could be a RF related problem?

So, we need OTA logs to be sure of this, and hence I requested the logs.

Could it be something related to the fact that ESP32 also supports legacy Bluetooth but ESP32-S3 only supports BLE?

Well, I believe you are using NimBLE as a stack for both chips. So, eventually, it should not matter, as these are standalone BLE operations.

Are the two CPU using a different BLE radio stack?

The underlying CPU has Xtensa architecture support (one is LX6 while the other is LX7). So they are the same architecture.

kojibuta commented 1 year ago

Hi @SumeetSingh19,

So, we need OTA logs to be sure of this, and hence I requested the logs.

I don't know how to get the OTA logs. Can you please tell me how to do it?

SatishSolankeEsp commented 1 year ago

Hi @kojibuta , Greetings of the day !!! I have gone through the logs and info on the JIRA, esp32s3 connection is failing because the reason of 0x3e connection failed to be established. The connection is failing with the 0x3e means a couple of factors might be involved so we need to check each and every scenario. The problem can be reproduced on any release so we can start debugging on release/v5.0, We need your help with the debug logs. I will provide you with the debug binary to find the root cause of the issue since you do not have an OTA. Can you please confirm that you are using controller lib 2b9445a6 and IDF release/v5.0? How many iterations need to run the reproduced issue? could you please let me know the peripheral device BLE version and feature support? Does adv stop while entering into the connection mode? Thanks, Satish

kojibuta commented 1 year ago

Hi @SatishSolankeEsp, I checked out release/5.0, compiled and run the test_app. In the logs I find the following lines:

I (290) cpu_start: ESP-IDF:          v5.0-beta1-824-ga8ef7570ca-dirt
I (524) coexist: coexist rom version e7ae62f
I (580) BT_INIT: BT controller compile version [76c24c9]
I (580) phy_init: phy_version 503,13653eb,Jun  1 2022,17:47:08

Apparently I am using controller lib 76c24c9 and IDF release/v5.0.

The issue always occurs immediately. Connection is never established, no matter how many times I try. Some peripheral models never work. Some (different) peripheral models always work.

The scan is canceled before initiating the connection. Here is the nimble API function call sequence:

ble_gap_disc_cancel()
ble_hs_id_infer_auto()
ble_gap_connect()

The device supports Bluetooth LE 4.0. No other specs are available. Here is the link to the supplier website https://www.viatomtech.com/aoj-20a-b.

The test_app initiates connection as soon as it finds a device with name "AOJ-20A". Here is a complete log of last test_app run. connect-failure.log

SumeetSingh19 commented 1 year ago

Hi @kojibuta , Thanks for the input, I have attached two files for you. The .patch file is the git diff patch that I need you to apply to your idf. The .a file is a controller lib file that I need you to put at esp-idf/components/bt/controller/lib_esp32c3_family/esp32s3 in place of the original file.

As discussed earlier could you please reproduce the issue of connection fail with debug controller lib and the provided patch to provide us the logs? Also since it is easily reproduce could you provide the logs with the controller lib and without the patch as well? Make sure that the version is release/v5.0.

0x3EDebugPrints.zip

kojibuta commented 1 year ago

Hi @SumeetSingh19,

Thanks for the input, I have attached two files for you.

I don't see the attachments.

SumeetSingh19 commented 1 year ago

@kojibuta , you should see a zip file now. It contains the two files.

kojibuta commented 1 year ago

Hi @SumeetSingh19,

sorry but I don't see any attchment. Here is the screenshot of my github page.

image

SumeetSingh19 commented 1 year ago

Hi @kojibuta , The link is in the original comment, but I'll reference it here anyway. 0x3EDebugPrints.zip

kojibuta commented 1 year ago

Hi @SumeetSingh19,

thank you, now I can see it. Don't know why but it was not visible. Going to do what you suggest ASAP.

Thank you.

kojibuta commented 1 year ago

Hi @SumeetSingh19,

I completed the tests you suggested.

Unfortunately the libbtdm_app.a file you sent seems to cause system failure on the ESP32S3:

E (729) BLE_INIT: controller init failed

I performed three tests:

  1. applying the patch only
  2. overwriting libbtdm_app.a file only
  3. with both modifications

Here are the log files of each of the three tests.

Patch only 2022_12_23_esp32s3-log-patch-only.txt

Controller lib only 2022_12_23_esp32s3-log-libbtdm_app-only.txt

Patch + controller lib 2022_12_23_esp32s3-log-libbtdm_app-and-patch.txt

I deleted and re-installed the SDK from scratch so to have a "clean" installation.

idf.py --version
ESP-IDF v5.1-dev-2509-gcfef24863f-dirty
kojibuta commented 1 year ago

Hi @SumeetSingh19 ,

I think the controller init failed depends on bt controller lib compile version mismatch.

The latest "release/v5.0" SDK version comes with 80abacd:

I (583) BT_INIT: BT controller compile version [80abacd]

The modified lib you sent instead is version 76c24c9 (as per my previous post).

Maybe you can point me to the correct commit hash to use the same source code as the modified library version you sent.

SatishSolankeEsp commented 1 year ago

Hi @kojibuta , Understood, conflict with the controller lib version will provide you debug lib on the tip of 80abacd. Thanks, Satish

SatishSolankeEsp commented 1 year ago

Hi @kojibuta , Could you please reproduce the issue using the below controller debug lib on the tip of release/v5.0?

BT controller compile version [ae171b2]

esp32s3_debug_bin.zip

Thanks, Satish

kojibuta commented 1 year ago

Hi @SumeetSingh19,

Please find attached the log file you requested using the modified controller lib, plus the log using the original lib for completeness.

Modified BT controller lib 2022_12_28_libbtdm_ae171b2_modified.log

I (25) boot: ESP-IDF v5.0-494-g490216a2ac-dirty 2nd stage bootloader
I (763) BT_INIT: BT controller compile version [ae171b2]

Un-modified (original) BT controller lib 2022_12_28_libbtdm_80abacd_original.log

I (25) boot: ESP-IDF v5.0-494-g490216a2ac-dirty 2nd stage bootloader
I (583) BT_INIT: BT controller compile version [80abacd]

Let me lnow if I can do something else to help.

SatishSolankeEsp commented 1 year ago

Hi @kojibuta , Do you able to reproduce the issue only with controller lib without applying the patch given by @SumeetSingh19 ? Thanks, Satish

kojibuta commented 1 year ago

Hi @SatishSolankeEsp,

Here is the log without the patch (fresh clone from github) and modified BT controller lib. 2022_12_30_libbtdm_ae171b2_no_patch.log

The connection attempt is at the end of the log file, it starts after the following lines:

I (7263) TEST_APP: test_app_gap_event: 7 BLE_GAP_EVENT_DISC
I (7273) TEST_APP: test_app_gap_event: BLE_GAP_EVENT_DISC event_type BLE_HCI_ADV_RPT_EVTYPE_SCAN_RSP
I (7283) TEST_APP: 0x3fca1341   08 09 41 4f 4a 2d 32 30  41 02 0a 00 09 ff 00 00  |..AOJ-20A.......|
I (7283) TEST_APP: 0x3fca1351   00 00 00 00 00 00 00 00  00                       |.........|
I (7293) TEST_APP: test_app_gap_event: name "AOJ-20A"
SatishSolankeEsp commented 1 year ago

Hi @kojibuta , We have checked the debug log seems we do not receive the packet once the device is in connection mode. We have also checked that connecting a similar set of feature devices working fine. Our device is capable of adjusting the RX window to capture the tx packet of the peripheral device that still fails. We need OTA only to debug or find the root cause of an issue, not sure How will you do it? Peripheral device your using which firm (company ) Bluetooth chipset they have used in that? Do you have any other devices as well where you faced similar issues? Just want you to try it, Does that peripheral device will connect /esp-idf/examples/bluetooth/nimble/blecent example without having your code or none of the print? Thanks, Satish

kojibuta commented 1 year ago

Hi @SatishSolankeEsp,

is it possible that the problem is caused by connection params such as conn_intvl and conn_window?

We need OTA only to debug or find the root cause of an issue, not sure How will you do it?

I don't know how to capture the OTA logs. What procedure do you use to get it? I have some USB dongles with nRF 52x that can be used as sniffer. Is that an option?

Just want you to try it, Does that peripheral device will connect /esp-idf/examples/bluetooth/nimble/blecent example without having your code or none of the print?

The test_app I am using is actually the blecent example slightly modified. I only added an IF statement to open connection with devices having names containing strings "AOJ-20A" or "BPM-188" (two devices that fail) and "SleepO2 0751" (one device that works).

Peripheral device your using which firm (company ) Bluetooth chipset they have used in that?

Currently I have the problem with two devices from Viatom (a thermomether and a table pressure monitor). I don't know what chipset they use. Also I don't have the problem with different devices from the same Viatom supplier (e.g. oxigen monitor rings, other pressure monitor models, etc.).

I opened one device to inspect the PCB to find out the chipset. The device uses a BT2851-V1 module. I didn't find thw cipset used, on the component I can barely see the labels LSR826 CRF2045 but I can't find anything on the Internet. He are some pictures.

IMG-4145 IMG-4147

SatishSolankeEsp commented 1 year ago

Hi @kojibuta , I don't know how to capture the OTA logs. What procedure do you use to get it? I have some USB dongles with nRF 52x that can be used as sniffers. Is that an option?

Yes, if the device will capture the Over the air packet between two devices.

is it possible that the problem is caused by connection params such as conn_intvl and conn_window?

is there any diff between the pass and fail scenario . as I see 40 is conn interval.

Thanks, Satish

kojibuta commented 1 year ago

Hi @SatishSolankeEsp,

I finally managed to capture some on the air logs using a nRF52840 dongle.

The ESP32 address is c0:49:ef:07:10:ba The ESP32-S3 address is f4:12:fa:cb:01:be The peripheral name is "AOJ-20A" The peripheral address is a4:c1:00:00:1a:0d

The behaviour is very similar for both ESP32 and ESP32-S3, the packet sequence is always: ..., ADV_IND, ADV_IND, SCAN_REQ, SCAN_RSP, ADV_IND, CONNECT_IND, Empty PDU, Empty PDU, ...

I can see some differences is in the first Empty PDU (master to slave) after the CONNECT_IND and what happens is that the peripheral does not respond to the packet sent by the ESP32-S3 (but it answers to the packet sent by the ESP32 instead).

[ESP32]
0000   fa 13 00 03 d8 c6 06 0a 0b 07 17 00 00 d4 62 12
0010   49 cd cc ca 25 01 00 57 3d 96

[ESP32-S3]
0000   04 13 00 03 6d ba 06 0a 0b 0d 2b 00 00 38 04 54
0010   30 cc ce c1 24 01 00 57 cd e3

Capture file with many (unsuccessful) connect attempts by ESP32-S3 module capture-esp32-s3-wroom-1.pcapng.zip

Capture file with one successful connect attempt by ESP32 module capture-esp32-wroom-32e.pcapng.zip

kojibuta commented 1 year ago

Hi @SatishSolankeEsp,

This is what the typical (unsuccessful) connection attempt looks like in wireshark image

SatishSolankeEsp commented 1 year ago

Hi @kojibuta , The observation for the above SS is 0x27c1cacf is the Espressif device that is trying to connect to the peer device, we have requested correctly to connect the peer device but somehow peer has not received and advertising has also not stopped these two things conclude peer has an issue with his RX. Could you please send me the OTA file? I will check if something is missed by esp32s3. Thanks, Satish

kojibuta commented 1 year ago

Hi @SatishSolankeEsp,

I posted the OTA capture files in the previous message, please check.

I'm attaching files here once again for your conveniece.

Capture file with many (unsuccessful) connect attempts by ESP32-S3 module: capture-esp32-s3-wroom-1.pcapng.zip

Capture file with one successful connect attempt by ESP32 module: capture-esp32-wroom-32e.pcapng.zip

SatishSolankeEsp commented 1 year ago

HI @kojibuta , Thanks for the log and OTA which help us find the root cause of the issue and fix it.

Please try this fix by putting this controller lib of esp32s3 on the tip of release/v5.0. IDF TIP release/v5.0:

image

Controller lib HEAD TIP: (controller commit id 80abacdd)

image

Replace this attached lib in the below path:

/esp-idf/components/bt/controller/lib_esp32c3_family/esp32s3

ble_conn_fail_fix_esp32s3.zip

Let me know the result and share the OTA as well.

Thanks, Satish

kojibuta commented 1 year ago

Hi @SatishSolankeEsp,

Great news! Your fix is working perfectly. All my ESP32-S3 boards successfully connect to the peripherals and perform the service discovery.

Thank you very much for your support.

Please find below some logs and an OTA capture file recorded during some successful connections.

[log file] log_aoj-20a_success.txt

[log file debug level] log_aoj-20a_success_debug.txt

[wireshark OTA capture] 2023_02_03_connect_success.pcapng.zip

SatishSolankeEsp commented 1 year ago

Hi @kojibuta , Great !!! Thanks a lot for the quick turnaround time. We will merge soon this fix on release/v5.0 meanwhile you can use the above lib. CC @Isl2017 Thanks, Satish

kojibuta commented 1 year ago

Hi @SatishSolankeEsp,

Apparently the FIX causes a regression. Sorry for the bad news.

Using the library with your fix solves connection problems with several devices (e.g. "AOJ-20A").

Unfortunately other devices (e.g. "DuoEK 1429"), that are working without the patch, with the fix cannot connect anymore.

What I can see in the debug logs is that the connection is setup correctly but the "exchange mtu" procedure fails. The connect is retried 3 times but MTU exchange always fail and the disconnect event is generated.

To better clarify:

Please find attaches some logs, hope they can help.

[ESP32-S3 - using lib fix - device "AOJ-20A" successful connect] 2023_02_06_esp32s3_lib_fix_aoj20a_success.txt

[ESP32-S3 - using lib fix - device "DuoEK 1429" connect error (log+air)] 2023_02_06_esp32s3_lib_fix_duoek_error.txt 2023_02_06_esp32s3_lib_fix_duoek_connect_error.pcapng.zip 2023_02_06_esp32s3_lib_fix_duoek_connect_error_2.pcapng.zip

[ESP32-S3 - stable/v5.0 - device "AOJ-20A" connect error] 2023_02_06_esp32s3_stable_v50_aoj20a_error.txt

[ESP32-S3 - stable/v5.0 - device "DuoEK 1429" successful connect] 2023_02_06_esp32s3_stable_v50_duoek_success.txt

I also found some errors with service discovery using the stable/v5.0 SDK version. After successful connection and MTU exchange, service discovery fails and causes ESP32-S3 to hang. Maybe this is all part of the same problem.

[ESP32-S3 - stable/v5.0 - device "DuoEK 1429" successful connect + service discovery error (log+air)] 2023_02_06_esp32s3_stable_v50_duoek_connect_discovery_error.txt 2023_02_06_esp32s3_stable_v50_duoek_discovery_error.pcapng.zip 2023_02_06_esp32s3_stable_v50_duoek_discovery_error_2.pcapng.zip

SatishSolankeEsp commented 1 year ago

Hi @kojibuta , We are working on it and will tell you the procedure for how to use it with a new fix. Thanks, Satish

SatishSolankeEsp commented 1 year ago

Hi @kojibuta , Please try the fix below for all your devices and let me know the result. S3_fix_conn_fail.zip Thanks, Satish

kojibuta commented 1 year ago

Hi @SatishSolankeEsp,

I already tested the lib S3_fix_conn_fail.zip with several devices supported by my IoT gateway app. It is working with all of them.

Devices tested so far:

I will make some more tests in the upcoming days and let you know the results.

Thank you very much for your support.

kojibuta commented 1 year ago

Hi @SatishSolankeEsp,

I also run the test_app (NimBLE + ESP32-S3) and the service discovery error with device Viatom DuoEK is still present.

nimble_error_log.txt

Using Bluedroid instead I get no error at all and all devices work correctly.

Apparently your fix is working perfectly with Bluedroid (all devices working) but not with NimBLE (some device still not working).

Let me know if I can help you sending more debug/OTA logs.

SatishSolankeEsp commented 1 year ago

Hi @kojibuta , Can you please tell me how many devices still have the issue with nimble? share the OTA of Viatom DuoEK . Great !!! all the devices work with Bluedroid which was not the case earlier. Thanks, Satish

kojibuta commented 1 year ago

Hi @SatishSolankeEsp,

After extensive testing I can confirm that all the test devices I have are currently working with ESP32-S3 and Bluedroid.

I can also confirm that roughly a half of the same devices is not working as expected using NimBLE instead.

Please find attached some logs files both of working and not working devices (2+2). Each archive contains the debug log and the OTA log of a test session.

NOTE: the device E66 is a Fitness Bracelet using Nordic nRF52832 chip. It is a very common chip so I think it is important to find out why NimBLE is not working with this device.

[Successful tests] BPM188_success.zip O2Ring_success.zip

[Failed tests] DuoOK_fail.zip E66_fail.zip

SatishSolankeEsp commented 1 year ago

Hi @kojibuta , I have checked a couple of time fail and successful OTA logs but there is no difference all the device works fine, and there is packet exchange going on. I've included for you the attached screenshot. MicrosoftTeams-image (12) MicrosoftTeams-image (11) MicrosoftTeams-image (10)

@SumeetSingh19 , The controller issue is solved now, please do check failure is because of nimble since Bluedroid works fine which was not the earlier case. Thanks, Satish

kojibuta commented 1 year ago

Hi @SatishSolankeEsp,

I tried compiling my firmware using version ESP-IDF v5.0.2 and ESP32-S3 does not work yet.

Do you think your patch to the bluetooth controller lib will be merged into the master branch any time?

SatishSolankeEsp commented 1 year ago

Hi @kojibuta , Yes, controller lib is not yet merged, you can continue with the lib I shared. will get back to you on these. Thanks, Satish

SatishSolankeEsp commented 1 year ago

HI @kojibuta , I have checked internally, We are not Merging the MR in the controller codebase as a remote device issue but, provided you the library with a workaround by selecting the cs#1 algorithm.

As per Bluetooth specification, we have selected the correct algorithm. BLE4.0 ESP32 channel select algorithm #1 BLE5.0 ESP32S3 channel select algorithm #2

I have come up with a solution for it, the remote device fails to connect or the remote device supports ble4.2,4.0 for them you can disable the CONFIG_BT_BLE_50_FEATURES_SUPPORTED flag in the IDF menu config and re-try to connect. This way you can use any ESP-IDF version.

Please let me know if these procedures are useful for you.

Thanks, Satish

kojibuta commented 1 year ago

Hi @SatishSolankeEsp,

Unfortunately disabling CONFIG_BT_BLE_50_FEATURES_SUPPORTED does not work for me. I am currently compiling with "ESP-IDF v5.0.2". Below you can find the "Bluetooth -> Bluedroid options" settings for my project.

image

SatishSolankeEsp commented 1 year ago

Hi @kojibuta , Look like you are trying the thing on an older commit ID, could you please try the same test case on the latest IDFV5.0? image

Our fix is merged with the below commit id : image

Thanks, Satish