Closed kojibuta closed 1 year ago
Here is the complete project directory: test_app.zip
I created an identical app using bluedroid and the bahavior is exactly the same, the only difference is the error code that is 0x3B using Bluedroid while it is 7 using NimBLE.
Hi @kojibuta, Could you please share the "DEBUG" logs for the success and failed cases. To enable debug logs, do:
Save and quit the menuconfig, build the central, flash, and monitor and you should see log output starting with D e.g. "D (744) NimBLE: registered service 0x1800 with handle=1" This means that debug logs are enabled. You may need to re-enable them when building the application again.
Also, try connecting the peripherals you mentioned with third party BLE testing tools like nRF Connect (as central) and go for service discovery. Please share the logs for the same.
Hi @SumeetSingh19,
here are the debug logs running the same source code with two different devices (Blood Pressure Monitor B02T and Thermomether AOJ-20A). Log files 1 and 2 running on ESP32-S3-WROOM-1 (connection failure), log files 3 and 4 running on ESP32-WROOM-32D (successful connect/discovery).
Log of ESP32-S3 using Wellue/Viatom Blood Pressure Monitor B02T, advertised name "BPM-188": esp32-s3-peripheral-debug-3.log
Log of ESP32-S3 using Viatom Thermomether AOJ-20A, advertised name "AOJ-20A": esp32-s3-peripheral-debug-4.log
Log of ESP32 using Wellue/Viatom Blood Pressure Monitor B02T, advertised name "BPM-188": esp32-peripheral-debug-3.log
Log of ESP32 using Viatom Thermomether AOJ-20A, advertised name "AOJ-20A": esp32-peripheral-debug-4.log
Here are the logs using nRF-Connect on iOS (not really verbose BTW):
Wellue/Viatom Blood Pressure Monitor B02T: iOS-nRF-Connect-peripheral-3.log
Viatom Thermomether AOJ-20A: iOS-nRF-Connect-peripheral-4.log
Please note that I had to slightly modify the code to start connection with device "AOJ-20A" because the function ble_hs_adv_parse_fields()
is not parsing correctly the advertisement. The function return BLE_HS_EBADDATA because the advertisement is padded with 0x00 bytes up to the length of 25 bytes (i.e. 3 extra bytes):
I (3354) TEST_APP: test_app_gap_event: event_type BLE_HCI_ADV_RPT_EVTYPE_SCAN_RSP
I (3364) TEST_APP: 0x3fc9e9c5 08 09 41 4f 4a 2d 32 30 41 02 0a 00 09 ff 00 00 |..AOJ-20A.......|
I (3374) TEST_APP: 0x3fc9e9d5 00 00 00 00 00 00 00 00 00 |.........|
I think the function should be more "forgiving" with zero-padded advertisements in order to support a wider range of physical devices. You can always think of padding zeros as records having a length of 0. The above advertisement you have 6 records (instead of 3) with lengths 8, 2, 9, 0, 0, 0.
Here is the modified main.c file: test_app.zip
Hi @SumeetSingh19,
Apparently the controller is sending a different HCI command to the host and this triggers the error.
I tried running the same app on Unexpected Maker TinyS3 development board with ESP32-S3FN8 MCU and I get the same results.
I would like to know if you plan to address this problem in the near future, otherwise I must go back to ESP32-WROOM-32E module for my device. I would like to use the new ESP32-S3-WROOM-1 because it has better overall performances but I need to support several BLE devices that do not work properly with ESP32-S3.
Hi @kojibuta, the controller doesn't send different HCI commands, actually, the first byte of the command is different because the connection handles in the two chips start at 0 on esp32 and 1 on esp32s3.
HI @kojibuta
i checked your logs and compared it with success logs . The failure scenario in your case is that even though connection packet has been sent, the actual connection is not happening. As seen in logs, we see a disconnect with reason code 0x3E. This basically points that the actual over the air connection with remote is failed.
This behaviour also points that this may be specific in the communication of S3 and the particular remote device. Am sure, S3 would be working with other devices , even in your tests ?
I wanted to check, if it is feasible for you to get a OTA log to check the packets ? Since from host side, we have already posted the command with valid parameters.
Hi @SumeetSingh19
I run into this issue while porting an existing firmware from ESP32 to ESP32-S3. The firmware has been working very well on a custom BLE/MQTT gateway I designed for several years. I am in the process of re-designing the custom gateway and I would like to upgrade to ESP32-S3. Unfortunately some of the BLE peripherals my customers are actively using do not work with ESP32-S3.
This behaviour also points that this may be specific in the communication of S3 and the particular remote device. Am sure, S3 would be working with other devices , even in your tests ?
The ESP32-S3 board and my firmware both work well with many devices by the way. But not with all of them. A subset of the devices I must support is not working.
Since all devices work well using the same firmware on ESP32 (instead of ESP32-S3) I thought it could be a BLE stack implementation bug specific to ESP32-S3.
I wanted to check, if it is feasible for you to get a OTA log to check the packets ? Since from host side, we have already posted the command with valid parameters.
I really would like to send you all the logs you need, but I don't know how/where to get a OTA log. Is it something I can enable on ESP32? Do I need some external BLE packet sniffer? If yes, can you point me to the correct one?
Hi @SumeetSingh19,
Since from host side, we have already posted the command with valid parameters.
Do you mean it can be a RF related problem? Could it be something related to the fact that ESP32 also supports legacy bluetooth but ESP32-S3 only supports BLE? Are the two CPU using a different BLE radio stack?
Do you mean it could be a RF related problem?
So, we need OTA logs to be sure of this, and hence I requested the logs.
Could it be something related to the fact that ESP32 also supports legacy Bluetooth but ESP32-S3 only supports BLE?
Well, I believe you are using NimBLE as a stack for both chips. So, eventually, it should not matter, as these are standalone BLE operations.
Are the two CPU using a different BLE radio stack?
The underlying CPU has Xtensa architecture support (one is LX6 while the other is LX7). So they are the same architecture.
Hi @SumeetSingh19,
So, we need OTA logs to be sure of this, and hence I requested the logs.
I don't know how to get the OTA logs. Can you please tell me how to do it?
Hi @kojibuta , Greetings of the day !!! I have gone through the logs and info on the JIRA, esp32s3 connection is failing because the reason of 0x3e connection failed to be established. The connection is failing with the 0x3e means a couple of factors might be involved so we need to check each and every scenario. The problem can be reproduced on any release so we can start debugging on release/v5.0, We need your help with the debug logs. I will provide you with the debug binary to find the root cause of the issue since you do not have an OTA. Can you please confirm that you are using controller lib 2b9445a6 and IDF release/v5.0? How many iterations need to run the reproduced issue? could you please let me know the peripheral device BLE version and feature support? Does adv stop while entering into the connection mode? Thanks, Satish
Hi @SatishSolankeEsp, I checked out release/5.0, compiled and run the test_app. In the logs I find the following lines:
I (290) cpu_start: ESP-IDF: v5.0-beta1-824-ga8ef7570ca-dirt
I (524) coexist: coexist rom version e7ae62f
I (580) BT_INIT: BT controller compile version [76c24c9]
I (580) phy_init: phy_version 503,13653eb,Jun 1 2022,17:47:08
Apparently I am using controller lib 76c24c9 and IDF release/v5.0.
The issue always occurs immediately. Connection is never established, no matter how many times I try. Some peripheral models never work. Some (different) peripheral models always work.
The scan is canceled before initiating the connection. Here is the nimble API function call sequence:
ble_gap_disc_cancel()
ble_hs_id_infer_auto()
ble_gap_connect()
The device supports Bluetooth LE 4.0. No other specs are available. Here is the link to the supplier website https://www.viatomtech.com/aoj-20a-b.
The test_app initiates connection as soon as it finds a device with name "AOJ-20A". Here is a complete log of last test_app run. connect-failure.log
Hi @kojibuta , Thanks for the input, I have attached two files for you. The .patch file is the git diff patch that I need you to apply to your idf. The .a file is a controller lib file that I need you to put at esp-idf/components/bt/controller/lib_esp32c3_family/esp32s3 in place of the original file.
As discussed earlier could you please reproduce the issue of connection fail with debug controller lib and the provided patch to provide us the logs? Also since it is easily reproduce could you provide the logs with the controller lib and without the patch as well? Make sure that the version is release/v5.0.
Hi @SumeetSingh19,
Thanks for the input, I have attached two files for you.
I don't see the attachments.
@kojibuta , you should see a zip file now. It contains the two files.
Hi @SumeetSingh19,
sorry but I don't see any attchment. Here is the screenshot of my github page.
Hi @kojibuta , The link is in the original comment, but I'll reference it here anyway. 0x3EDebugPrints.zip
Hi @SumeetSingh19,
thank you, now I can see it. Don't know why but it was not visible. Going to do what you suggest ASAP.
Thank you.
Hi @SumeetSingh19,
I completed the tests you suggested.
Unfortunately the libbtdm_app.a file you sent seems to cause system failure on the ESP32S3:
E (729) BLE_INIT: controller init failed
I performed three tests:
Here are the log files of each of the three tests.
Patch only 2022_12_23_esp32s3-log-patch-only.txt
Controller lib only 2022_12_23_esp32s3-log-libbtdm_app-only.txt
Patch + controller lib 2022_12_23_esp32s3-log-libbtdm_app-and-patch.txt
I deleted and re-installed the SDK from scratch so to have a "clean" installation.
idf.py --version
ESP-IDF v5.1-dev-2509-gcfef24863f-dirty
Hi @SumeetSingh19 ,
I think the controller init failed depends on bt controller lib compile version mismatch.
The latest "release/v5.0" SDK version comes with 80abacd:
I (583) BT_INIT: BT controller compile version [80abacd]
The modified lib you sent instead is version 76c24c9 (as per my previous post).
Maybe you can point me to the correct commit hash to use the same source code as the modified library version you sent.
Hi @kojibuta , Understood, conflict with the controller lib version will provide you debug lib on the tip of 80abacd. Thanks, Satish
Hi @kojibuta , Could you please reproduce the issue using the below controller debug lib on the tip of release/v5.0?
BT controller compile version [ae171b2]
Thanks, Satish
Hi @SumeetSingh19,
Please find attached the log file you requested using the modified controller lib, plus the log using the original lib for completeness.
Modified BT controller lib 2022_12_28_libbtdm_ae171b2_modified.log
I (25) boot: ESP-IDF v5.0-494-g490216a2ac-dirty 2nd stage bootloader
I (763) BT_INIT: BT controller compile version [ae171b2]
Un-modified (original) BT controller lib 2022_12_28_libbtdm_80abacd_original.log
I (25) boot: ESP-IDF v5.0-494-g490216a2ac-dirty 2nd stage bootloader
I (583) BT_INIT: BT controller compile version [80abacd]
Let me lnow if I can do something else to help.
Hi @kojibuta , Do you able to reproduce the issue only with controller lib without applying the patch given by @SumeetSingh19 ? Thanks, Satish
Hi @SatishSolankeEsp,
Here is the log without the patch (fresh clone from github) and modified BT controller lib. 2022_12_30_libbtdm_ae171b2_no_patch.log
The connection attempt is at the end of the log file, it starts after the following lines:
I (7263) TEST_APP: test_app_gap_event: 7 BLE_GAP_EVENT_DISC
I (7273) TEST_APP: test_app_gap_event: BLE_GAP_EVENT_DISC event_type BLE_HCI_ADV_RPT_EVTYPE_SCAN_RSP
I (7283) TEST_APP: 0x3fca1341 08 09 41 4f 4a 2d 32 30 41 02 0a 00 09 ff 00 00 |..AOJ-20A.......|
I (7283) TEST_APP: 0x3fca1351 00 00 00 00 00 00 00 00 00 |.........|
I (7293) TEST_APP: test_app_gap_event: name "AOJ-20A"
Hi @kojibuta , We have checked the debug log seems we do not receive the packet once the device is in connection mode. We have also checked that connecting a similar set of feature devices working fine. Our device is capable of adjusting the RX window to capture the tx packet of the peripheral device that still fails. We need OTA only to debug or find the root cause of an issue, not sure How will you do it? Peripheral device your using which firm (company ) Bluetooth chipset they have used in that? Do you have any other devices as well where you faced similar issues? Just want you to try it, Does that peripheral device will connect /esp-idf/examples/bluetooth/nimble/blecent example without having your code or none of the print? Thanks, Satish
Hi @SatishSolankeEsp,
is it possible that the problem is caused by connection params such as conn_intvl and conn_window?
We need OTA only to debug or find the root cause of an issue, not sure How will you do it?
I don't know how to capture the OTA logs. What procedure do you use to get it? I have some USB dongles with nRF 52x that can be used as sniffer. Is that an option?
Just want you to try it, Does that peripheral device will connect /esp-idf/examples/bluetooth/nimble/blecent example without having your code or none of the print?
The test_app I am using is actually the blecent example slightly modified. I only added an IF statement to open connection with devices having names containing strings "AOJ-20A" or "BPM-188" (two devices that fail) and "SleepO2 0751" (one device that works).
Peripheral device your using which firm (company ) Bluetooth chipset they have used in that?
Currently I have the problem with two devices from Viatom (a thermomether and a table pressure monitor). I don't know what chipset they use. Also I don't have the problem with different devices from the same Viatom supplier (e.g. oxigen monitor rings, other pressure monitor models, etc.).
I opened one device to inspect the PCB to find out the chipset. The device uses a BT2851-V1 module. I didn't find thw cipset used, on the component I can barely see the labels LSR826 CRF2045 but I can't find anything on the Internet. He are some pictures.
Hi @kojibuta , I don't know how to capture the OTA logs. What procedure do you use to get it? I have some USB dongles with nRF 52x that can be used as sniffers. Is that an option?
Yes, if the device will capture the Over the air packet between two devices.
is it possible that the problem is caused by connection params such as conn_intvl and conn_window?
is there any diff between the pass and fail scenario . as I see 40 is conn interval.
Thanks, Satish
Hi @SatishSolankeEsp,
I finally managed to capture some on the air logs using a nRF52840 dongle.
The ESP32 address is c0:49:ef:07:10:ba The ESP32-S3 address is f4:12:fa:cb:01:be The peripheral name is "AOJ-20A" The peripheral address is a4:c1:00:00:1a:0d
The behaviour is very similar for both ESP32 and ESP32-S3, the packet sequence is always: ..., ADV_IND, ADV_IND, SCAN_REQ, SCAN_RSP, ADV_IND, CONNECT_IND, Empty PDU, Empty PDU, ...
I can see some differences is in the first Empty PDU (master to slave) after the CONNECT_IND and what happens is that the peripheral does not respond to the packet sent by the ESP32-S3 (but it answers to the packet sent by the ESP32 instead).
[ESP32]
0000 fa 13 00 03 d8 c6 06 0a 0b 07 17 00 00 d4 62 12
0010 49 cd cc ca 25 01 00 57 3d 96
[ESP32-S3]
0000 04 13 00 03 6d ba 06 0a 0b 0d 2b 00 00 38 04 54
0010 30 cc ce c1 24 01 00 57 cd e3
Capture file with many (unsuccessful) connect attempts by ESP32-S3 module capture-esp32-s3-wroom-1.pcapng.zip
Capture file with one successful connect attempt by ESP32 module capture-esp32-wroom-32e.pcapng.zip
Hi @SatishSolankeEsp,
This is what the typical (unsuccessful) connection attempt looks like in wireshark
Hi @kojibuta , The observation for the above SS is 0x27c1cacf is the Espressif device that is trying to connect to the peer device, we have requested correctly to connect the peer device but somehow peer has not received and advertising has also not stopped these two things conclude peer has an issue with his RX. Could you please send me the OTA file? I will check if something is missed by esp32s3. Thanks, Satish
Hi @SatishSolankeEsp,
I posted the OTA capture files in the previous message, please check.
I'm attaching files here once again for your conveniece.
Capture file with many (unsuccessful) connect attempts by ESP32-S3 module: capture-esp32-s3-wroom-1.pcapng.zip
Capture file with one successful connect attempt by ESP32 module: capture-esp32-wroom-32e.pcapng.zip
HI @kojibuta , Thanks for the log and OTA which help us find the root cause of the issue and fix it.
Please try this fix by putting this controller lib of esp32s3 on the tip of release/v5.0. IDF TIP release/v5.0:
Controller lib HEAD TIP: (controller commit id 80abacdd)
Replace this attached lib in the below path:
/esp-idf/components/bt/controller/lib_esp32c3_family/esp32s3
Let me know the result and share the OTA as well.
Thanks, Satish
Hi @SatishSolankeEsp,
Great news! Your fix is working perfectly. All my ESP32-S3 boards successfully connect to the peripherals and perform the service discovery.
Thank you very much for your support.
Please find below some logs and an OTA capture file recorded during some successful connections.
[log file] log_aoj-20a_success.txt
[log file debug level] log_aoj-20a_success_debug.txt
[wireshark OTA capture] 2023_02_03_connect_success.pcapng.zip
Hi @kojibuta , Great !!! Thanks a lot for the quick turnaround time. We will merge soon this fix on release/v5.0 meanwhile you can use the above lib. CC @Isl2017 Thanks, Satish
Hi @SatishSolankeEsp,
Apparently the FIX causes a regression. Sorry for the bad news.
Using the library with your fix solves connection problems with several devices (e.g. "AOJ-20A").
Unfortunately other devices (e.g. "DuoEK 1429"), that are working without the patch, with the fix cannot connect anymore.
What I can see in the debug logs is that the connection is setup correctly but the "exchange mtu" procedure fails. The connect is retried 3 times but MTU exchange always fail and the disconnect event is generated.
To better clarify:
Please find attaches some logs, hope they can help.
[ESP32-S3 - using lib fix - device "AOJ-20A" successful connect] 2023_02_06_esp32s3_lib_fix_aoj20a_success.txt
[ESP32-S3 - using lib fix - device "DuoEK 1429" connect error (log+air)] 2023_02_06_esp32s3_lib_fix_duoek_error.txt 2023_02_06_esp32s3_lib_fix_duoek_connect_error.pcapng.zip 2023_02_06_esp32s3_lib_fix_duoek_connect_error_2.pcapng.zip
[ESP32-S3 - stable/v5.0 - device "AOJ-20A" connect error] 2023_02_06_esp32s3_stable_v50_aoj20a_error.txt
[ESP32-S3 - stable/v5.0 - device "DuoEK 1429" successful connect] 2023_02_06_esp32s3_stable_v50_duoek_success.txt
I also found some errors with service discovery using the stable/v5.0 SDK version. After successful connection and MTU exchange, service discovery fails and causes ESP32-S3 to hang. Maybe this is all part of the same problem.
[ESP32-S3 - stable/v5.0 - device "DuoEK 1429" successful connect + service discovery error (log+air)] 2023_02_06_esp32s3_stable_v50_duoek_connect_discovery_error.txt 2023_02_06_esp32s3_stable_v50_duoek_discovery_error.pcapng.zip 2023_02_06_esp32s3_stable_v50_duoek_discovery_error_2.pcapng.zip
Hi @kojibuta , We are working on it and will tell you the procedure for how to use it with a new fix. Thanks, Satish
Hi @kojibuta , Please try the fix below for all your devices and let me know the result. S3_fix_conn_fail.zip Thanks, Satish
Hi @SatishSolankeEsp,
I already tested the lib S3_fix_conn_fail.zip with several devices supported by my IoT gateway app. It is working with all of them.
Devices tested so far:
I will make some more tests in the upcoming days and let you know the results.
Thank you very much for your support.
Hi @SatishSolankeEsp,
I also run the test_app (NimBLE + ESP32-S3) and the service discovery error with device Viatom DuoEK is still present.
Using Bluedroid instead I get no error at all and all devices work correctly.
Apparently your fix is working perfectly with Bluedroid (all devices working) but not with NimBLE (some device still not working).
Let me know if I can help you sending more debug/OTA logs.
Hi @kojibuta , Can you please tell me how many devices still have the issue with nimble? share the OTA of Viatom DuoEK . Great !!! all the devices work with Bluedroid which was not the case earlier. Thanks, Satish
Hi @SatishSolankeEsp,
After extensive testing I can confirm that all the test devices I have are currently working with ESP32-S3 and Bluedroid.
I can also confirm that roughly a half of the same devices is not working as expected using NimBLE instead.
Please find attached some logs files both of working and not working devices (2+2). Each archive contains the debug log and the OTA log of a test session.
NOTE: the device E66 is a Fitness Bracelet using Nordic nRF52832 chip. It is a very common chip so I think it is important to find out why NimBLE is not working with this device.
[Successful tests] BPM188_success.zip O2Ring_success.zip
[Failed tests] DuoOK_fail.zip E66_fail.zip
Hi @kojibuta , I have checked a couple of time fail and successful OTA logs but there is no difference all the device works fine, and there is packet exchange going on. I've included for you the attached screenshot.
@SumeetSingh19 , The controller issue is solved now, please do check failure is because of nimble since Bluedroid works fine which was not the earlier case. Thanks, Satish
Hi @SatishSolankeEsp,
I tried compiling my firmware using version ESP-IDF v5.0.2 and ESP32-S3 does not work yet.
Do you think your patch to the bluetooth controller lib will be merged into the master branch any time?
Hi @kojibuta , Yes, controller lib is not yet merged, you can continue with the lib I shared. will get back to you on these. Thanks, Satish
HI @kojibuta , I have checked internally, We are not Merging the MR in the controller codebase as a remote device issue but, provided you the library with a workaround by selecting the cs#1 algorithm.
As per Bluetooth specification, we have selected the correct algorithm. BLE4.0 ESP32 channel select algorithm #1 BLE5.0 ESP32S3 channel select algorithm #2
I have come up with a solution for it, the remote device fails to connect or the remote device supports ble4.2,4.0 for them you can disable the CONFIG_BT_BLE_50_FEATURES_SUPPORTED flag in the IDF menu config and re-try to connect. This way you can use any ESP-IDF version.
Please let me know if these procedures are useful for you.
Thanks, Satish
Hi @SatishSolankeEsp,
Unfortunately disabling CONFIG_BT_BLE_50_FEATURES_SUPPORTED does not work for me. I am currently compiling with "ESP-IDF v5.0.2". Below you can find the "Bluetooth -> Bluedroid options" settings for my project.
Hi @kojibuta , Look like you are trying the thing on an older commit ID, could you please try the same test case on the latest IDFV5.0?
Our fix is merged with the below commit id :
Thanks, Satish
Answers checklist.
IDF version.
ESP-IDF v4.4.2-388-g755ce1077d
Operating System used.
macOS
How did you build your project?
Command line with idf.py
If you are using Windows, please specify command line type.
No response
Development Kit.
Custom board
Power Supply used.
USB
What is the expected behavior?
When using NimBLE (BLE central role) on ESP32-S3 it is not possible to discover services on some peripheral devices. The same code runs smoothly on ESP32 without errors.
After connection is established with a peripheral (event
BLE_GAP_EVENT_CONNECT
received withevent->connect.status == 0
) callingble_gattc_disc_all_svcs
should result in invokation of callback functionble_gatt_disc_svc_fn
witherror->status == 0
.What is the actual behavior?
When running the code on ESP32-S3, after connection is established with a peripheral, calling
ble_gattc_disc_all_svcs
results in immediate invokation of callback functionble_gatt_disc_svc_fn
witherror->status == 7
.Steps to reproduce.
ESP32-S3 logs (SleepO2 device OK, B02T device KO) ESP32-S3 board + SleepO2 device: esp32-s3-peripheral-1.log ESP32 board + B02T device: esp32-s3-peripheral-2.log
ESP32 logs (SleepO2 and B02T device OK) ESP32 board + SleepO2 device: esp32-peripheral-1.log ESP32 board + B02T device: esp32-peripheral-2.log
Debug Logs.
More Information.
Error always occur using some peripheral devices, but not with every device. Error occurs using ESP-IDF release/v4.4, release/v5.0 and latest versions.
Some peripheral causing the error are:
Some peripheral working without errors are: