Closed eriksl closed 1 month ago
Hello @eriksl ,
Can you possibly give more information on how to reproduce the issue .
Setup BT GATT on ESP32-S3 Try to pair (with no pre-existing bonds) Wait long, over 5 seconds.
Component config → Log output → Default log verbosity --> Set to Debug Component config → Bluetooth → NimBLE Options → NimBLE Host log verbosity --> Set to Debug
I am running my own application, https://github.com/eriksl/esp32 but not all required information to set up Nimble is available in the documentation and I had to guess some things from the examples like blerph. I will make sure the repository on github is up to date at the end of the day, right now it's a bit outdated due to holidays and a major overhaul + IDF upgrade.
The ESP32 is the peripheral, the Linux host is the central. The central initiates the pairing. It uses whatever method the peripheral offers. To rule out an issue with Bluez I also tried pairing using Android and that gives exactly the same result. Please note, you mention nRF, but the issue is way before the point where an application can connect. It's really during pairing.
IOCAP of course hasn't changed. Actually none of the BT setup code has changed.
The remote is waiting for the S3, not the other way around! The S3 seems to be busy with "something" and finally gives up when the watchdog timer intervenes.
I don't think the debug log will yield anything useful, but the backtrace from the watchdog timer will be very informative I think. The intervention always takes place at the same point in the code, so it looks like the processor stalls there for some reason. Look like some function to handle hardware offload from within MBEDTLS.
I'll attach the relevant code on the ESP32-S3 code, the stacktrace and the config.
Relevant sources. bluetooth-src.zip
sdkconfig sdkconfig.zip
Error log (stderr and stdout combined)
E (54650) task_wdt: Task watchdog got triggered. The following tasks/users did not reset the watchdog in time: E (54650) task_wdt: - IDLE0 (CPU 0) E (54650) task_wdt: Tasks currently running: E (54650) task_wdt: CPU 0: nimble_host E (54650) task_wdt: CPU 1: IDLE1 E (56138) task_wdt: Aborting. E (56138) task_wdt: Print CPU 0 backtrace
Two backtraces, slightly different.
One
0x4209f8f6: mbedtls_mpi_core_sub at /home/erik/.espressif/esp-idf/components/mbedtls/mbedtls/library/bignum_core.c:451 (discriminator 1)
0x42052889: mbedtls_mpi_sub_abs at /home/erik/.espressif/esp-idf/components/mbedtls/mbedtls/library/bignum.c:1089
0x420528f3: add_sub_mpi at /home/erik/.espressif/esp-idf/components/mbedtls/mbedtls/library/bignum.c:1121
0x42052956: mbedtls_mpi_sub_mpi at /home/erik/.espressif/esp-idf/components/mbedtls/mbedtls/library/bignum.c:1154
0x42054375: mbedtls_mpi_sub_mod at /home/erik/.espressif/esp-idf/components/mbedtls/mbedtls/library/ecp.c:1084
0x42054812: ecp_double_jac at /home/erik/.espressif/esp-idf/components/mbedtls/mbedtls/library/ecp.c:1512
0x42055135: ecp_precompute_comb at /home/erik/.espressif/esp-idf/components/mbedtls/mbedtls/library/ecp.c:1919
(inlined by) ecp_mul_comb at /home/erik/.espressif/esp-idf/components/mbedtls/mbedtls/library/ecp.c:2354
(inlined by) ecp_mul_restartable_internal at /home/erik/.espressif/esp-idf/components/mbedtls/mbedtls/library/ecp.c:2688
0x42055623: mbedtls_ecp_mul_restartable at /home/erik/.espressif/esp-idf/components/mbedtls/mbedtls/library/ecp.c:2722
0x42054258: ecdh_compute_shared_restartable at /home/erik/.espressif/esp-idf/components/mbedtls/mbedtls/library/ecdh.c:105
0x42054296: mbedtls_ecdh_compute_shared at /home/erik/.espressif/esp-idf/components/mbedtls/mbedtls/library/ecdh.c:129
0x4201fef5: ble_sm_alg_gen_dhkey at /home/erik/.espressif/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_sm_alg.c:578
0x42018926: ble_sm_sc_public_key_rx at /home/erik/.espressif/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_sm_sc.c:636
0x42013865: ble_sm_rx at /home/erik/.espressif/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_sm.c:2952
0x420149bb: ble_hs_hci_evt_acl_process at /home/erik/.espressif/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_hs_hci_evt.c:1195
0x42014110: ble_hs_process_rx_data_queue at /home/erik/.espressif/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_hs.c:248
0x4201411b: ble_hs_event_rx_data at /home/erik/.espressif/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_hs.c:545
0x4209e9ab: npl_freertos_event_run at /home/erik/.espressif/esp-idf/components/bt/host/nimble/nimble/porting/npl/freertos/src/npl_os_freertos.c:451
0x4037a969: ble_npl_event_run at /home/erik/.espressif/esp-idf/components/bt/host/nimble/nimble/porting/npl/freertos/include/nimble/nimble_npl_os.h:185
(inlined by) nimble_port_run at /home/erik/.espressif/esp-idf/components/bt/host/nimble/nimble/porting/nimble/src/nimble_port.c:308
0x4200dff9: nimble_port_task at /data/src/intern/esp/esp32/s3/develop/main/bt.c:94
Two
0x4209f86a: mbedtls_mpi_core_bitlen at /home/erik/.espressif/esp-idf/components/mbedtls/mbedtls/library/bignum_core.c:65
0x420526a5: mbedtls_mpi_bitlen at /home/erik/.espressif/esp-idf/components/mbedtls/mbedtls/library/bignum.c:474
0x420571fd: mbedtls_mpi_mul_mpi at /home/erik/.espressif/esp-idf/components/mbedtls/port/bignum/esp_bignum.c:490
0x420573cb: mbedtls_mpi_mul_int at /home/erik/.espressif/esp-idf/components/mbedtls/port/bignum/esp_bignum.c:572
0x42052c41: mbedtls_mpi_div_mpi at /home/erik/.espressif/esp-idf/components/mbedtls/mbedtls/library/bignum.c:1475
0x42052cdb: mbedtls_mpi_mod_mpi at /home/erik/.espressif/esp-idf/components/mbedtls/mbedtls/library/bignum.c:1543
0x420543c2: ecp_modp at /home/erik/.espressif/esp-idf/components/mbedtls/mbedtls/library/ecp.c:1004
(inlined by) mbedtls_mpi_mul_mod at /home/erik/.espressif/esp-idf/components/mbedtls/mbedtls/library/ecp.c:1062
0x420548ed: ecp_double_jac at /home/erik/.espressif/esp-idf/components/mbedtls/mbedtls/library/ecp.c:1551
0x42055135: ecp_precompute_comb at /home/erik/.espressif/esp-idf/components/mbedtls/mbedtls/library/ecp.c:1919
(inlined by) ecp_mul_comb at /home/erik/.espressif/esp-idf/components/mbedtls/mbedtls/library/ecp.c:2354
(inlined by) ecp_mul_restartable_internal at /home/erik/.espressif/esp-idf/components/mbedtls/mbedtls/library/ecp.c:2688
0x42055623: mbedtls_ecp_mul_restartable at /home/erik/.espressif/esp-idf/components/mbedtls/mbedtls/library/ecp.c:2722
0x42054258: ecdh_compute_shared_restartable at /home/erik/.espressif/esp-idf/components/mbedtls/mbedtls/library/ecdh.c:105
0x42054296: mbedtls_ecdh_compute_shared at /home/erik/.espressif/esp-idf/components/mbedtls/mbedtls/library/ecdh.c:129
0x4201fef5: ble_sm_alg_gen_dhkey at /home/erik/.espressif/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_sm_alg.c:578
0x42018926: ble_sm_sc_public_key_rx at /home/erik/.espressif/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_sm_sc.c:636
0x42013865: ble_sm_rx at /home/erik/.espressif/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_sm.c:2952
0x420149bb: ble_hs_hci_evt_acl_process at /home/erik/.espressif/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_hs_hci_evt.c:1195
0x42014110: ble_hs_process_rx_data_queue at /home/erik/.espressif/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_hs.c:248
0x4201411b: ble_hs_event_rx_data at /home/erik/.espressif/esp-idf/components/bt/host/nimble/nimble/nimble/host/src/ble_hs.c:545
0x4209e9ab: npl_freertos_event_run at /home/erik/.espressif/esp-idf/components/bt/host/nimble/nimble/porting/npl/freertos/src/npl_os_freertos.c:451
0x4037a969: ble_npl_event_run at /home/erik/.espressif/esp-idf/components/bt/host/nimble/nimble/porting/npl/freertos/include/nimble/nimble_npl_os.h:185
(inlined by) nimble_port_run at /home/erik/.espressif/esp-idf/components/bt/host/nimble/nimble/porting/nimble/src/nimble_port.c:308
0x4200dff9: nimble_port_task at /data/src/intern/esp/esp32/s3/develop/main/bt.c:94
Hi @eriksl ,
the backtrace shows mbedtls functions which , i guess, are causing infinite loop and hence the issue.
Can you please share the way to compile the code against IDF SDK you have shared ( either the bluetooth-src.zip or even the github repo) . I tried few ways like making folder inside IDF or exporting paths, but it fails.
I took the init content from bt.c and replicated it in SDK sample bleprph example. I tested against phone and bluez and both worked fine. So, if you can share your compilation setup against IDF SDK to get the application compiled in its entirety , it would help us to follow the same steps and debug more by reproducing issue locally.
Changed example file for reference: main.txt
My github repository is now completely up to date and should build cleanly now.
https://github.com/eriksl/esp32
Steps to build and test:
That should be sufficient. If there are any difficulties, please let me know and I will fix it.
Build as is now, it should work. If you change the watchdog timer back to the default of 6 seconds, the watchdog timer will go off.
Thank you for looking into this!
Please note there is no endlessly looping code (either in my app or in mbedtls) because the pairing will succeed if given enough time (set watchdog timer to 10 seconds).
Update, please clone this repository again or pull it (although cloning is probably safest). I have managed to make it ESP32-sub-type and board-type independent without the symlinks kludge. It now looks like a completely normal project, with just two extra menuconfig entries. The board selection currently does not have any influence on the BT code, so you can choose any of the two you like.
I don't expect any building issues now.
Compilation is ok now . Thanks.
but the chip fails to boot.
--- esp-idf-monitor 1.4.0 on /dev/ttyUSB0 115200 --- --- Quit: Ctrl+] | Menu: Ctrl+T | Help: Ctrl+T followed by Ctrl+H --- ESP-ROM:esp32s3-20210327 Build:Mar 27 2021 rst:0x1 (POWERON),boot:0x8 (SPI_FAST_FLASH_BOOT) SPIWP:0xee mode:DIO, clock div:1 load:0x3fce2820,len:0x1988 load:0x403c8700,len:0x4 load:0x403c8704,len:0xda0 load:0x403cb700,len:0x3518 entry 0x403c8918
The menuconfig has many changes compared to default example sdkconfig. I will try to figure out which option is causing this and check more
Interesting. It will probably have to do with me using a Wemos Lolin S3-mini that has a S3-H4R2 (4 MB flash and 2 MB SPIRAM on-package, quad) and I have them "hard" configured in the sdkconfig, I am afraid it can't be done otherwise without issues. If you can get a similar setup somewhere, I guess it could work. Maybe you need to the chip type to autodetect.
If this is a problem I may be able to make a version that runs without SPIRAM.
Sorry for the inconvencience.
This may have to do with me having all MBEDTLS memory allocated from SPIRAM (quad, 80 MHz). Apparently the calculation being performed is kind of heavy, possibly heavy on RAM as well.
The routine is NOT stuck in a loop, it will finish eventually, but no sooner than after ten seconds...
This may have to do with me having all MBEDTLS memory allocated from SPIRAM
Ok. This can be possible. Do you observe issue if SPIRAM support is disabled and internal memory is used ?
I can test this partially. I have remove the bonding from the central and then there is no problem. I can't remove the bonding from the client though, I would need to invalidate all of the NVS and it would also destroy all of my configuration variables.
May I run into this I will test again.
Closing this issue currently. Feel free to reopen in case of any further updates.
Answers checklist.
IDF version.
v5.3
Espressif SoC revision.
ESP32-S3
Operating System used.
Linux
How did you build your project?
Command line with idf.py
If you are using Windows, please specify command line type.
None
Development Kit.
Irrelevant here
Power Supply used.
USB
What is the expected behavior?
I expect the pairing to complete, as it has done previously, using both Linux and Android host device.
What is the actual behavior?
Watchdog timer timeout during. The timeout does occur deep inside the IDF code, somewhere in mbedtls, it's not in my code.
Steps to reproduce.
Debug Logs.
No response
More Information.
When I increase the watchdog timer timeout to 10 seconds, the issue is gone. Well, it's worked around. I don't think a) pairing should take this long b) even if it takes so long, the watchdog timer is never fed