espressif / esp-nimble

A fork of NimBLE stack, for use with ESP32 and ESP-IDF
Apache License 2.0
76 stars 49 forks source link

Crashing on reboot 5.2.1 #66

Closed mmackh closed 3 months ago

mmackh commented 3 months ago

Hello,

I'm trying to track down a crash that happens randomly on launch somewhere in the BLE stack with the latest release of ESP-IDF (5.2.1) and and ESP32-WROOM-32E. Attached are the crash reports - it's both random which one causes the system to reboot and when. So it sometimes get stuck on boot for a few times and then it goes through ok.

Previously I was on 5.1.2, no crashes. I'm also not sure if it is related to my UUID. What changed that could lead to this behaviour?

EXCVADDR: 0x84c2837a  LBEG    : 0x4000c2e0  LEND    : 0x4000c2f6  LCOUNT  : 0x00000000
Backtrace: 0x400e95ec:0x3ffbb570 0x400de3e2:0x3ffbb590 0x400dc596:0x3ffbb5d0 0x400da396:0x3ffbb610 0x400da4f3:0x3ffbbe50 0x401a43ce:0x3ffbbe70

0x400e95ec: ble_gatts_cpfd_is_sane at ble_gatts.c:751
   (inlined by) ble_gatts_count_resources at ble_gatts.c:3103
     (inlined by) ble_gatts_count_cfg at ble_gatts.c:3137
      0x400de3e2: NimBLEService::start() at NimBLEService.cpp:235
        0x400dc596: Sentionic::BluetoothLE::startHost(char const*, char const*, char const*, std::function) at BluetoothLE.cpp:103
          0x400da396: boot() at main.cpp:124 (discriminator 1)
            0x400da4f3: app_main at main.cpp:247
              0x401a43ce: main_task at app_startup.c:208
abort() was called at PC 0x400dddad on core 0
Backtrace: 0x40081732:0x3ffbb470 0x4009290d:0x3ffbb490 0x400982e2:0x3ffbb4b0 0x400dddad:0x3ffbb520 0x400dc746:0x3ffbb570 0x400dd56a:0x3ffbb5b0 0x400dc5b1:0x3ffbb5d0 0x400da396:0x3ffbb610 0x400da4f3:0x3ffbbe50 0x401a43ce:0x3ffbbe70

0x40081732: panic_abort at panic.c:472
  0x4009290d: esp_system_abort at esp_system_chip.c:93
    0x400982e2: abort at abort.c:38
      0x400dddad: NimBLEServer::start() at NimBLEServer.cpp:191
        0x400dc746: NimBLEAdvertising::start(unsigned long, void (*)(NimBLEAdvertising*), NimBLEAddress*) at NimBLEAdvertising.cpp:419
          0x400dd56a: NimBLEDevice::startAdvertising(unsigned long) at NimBLEDevice.cpp:181 (discriminator 1)
            0x400dc5b1: Sentionic::BluetoothLE::startHost(char const*, char const*, char const*, std::function) at BluetoothLE.cpp:111
              0x400da396: boot() at main.cpp:124 (discriminator 1)
                0x400da4f3: app_main at main.cpp:247
                  0x401a43ce: main_task at app_startup.c:208
rahult-github commented 3 months ago

Hi @mmackh ,

Crash 1:

Is there any custom service being added in your code ? If yes, then suggest to check the cpfd being set while declaring the service .

https://github.com/espressif/esp-nimble/blob/c0dd77a8355dd82d0bc1b501e6d118a91eb14df1/nimble/host/src/ble_gatts.c#L745 This function basically confirms the validity of the values. if any of the checks fails, then the function returns 0, which will result in assert.

Crash 2:

From backtrace it looks the control is still in the application code . So NimBLEServer.cpp line 191 can be checked for the reason for the crash.

mmackh commented 3 months ago

Thank you for the fast reply @rahult-github - the ble_gatt_cpfd is not being set. Is this a requirement?

As for crash number 2, it calls ble_gatts_start();

void NimBLEServer::start() {
    if(m_gattsStarted) {
        NIMBLE_LOGW(LOG_TAG, "Gatt server already started");
        return;
    }

    int rc = ble_gatts_start();
    if (rc != 0) {
        NIMBLE_LOGE(LOG_TAG, "ble_gatts_start; rc=%d, %s", rc,
                            NimBLEUtils::returnCodeToString(rc));
        abort();
    }
rahult-github commented 3 months ago

Hi @mmackh ,

the ble_gatt_cpfd is not being set

Ok, this is little weird. The code will get executed only if cpfd is part of the service . https://github.com/espressif/esp-nimble/blob/28844aaa7fa5dd2587ea4fb297e0a71af8c5465c/nimble/host/src/ble_gatts.c#L3102 . May be some memory corruption is happening , somehow.

As for crash number 2, it calls ble_gatts_start();

So, your applicaiton is catching the return value of ble_gatts_start and invoking abort . Can you possibly capture the return value for this function ?

Please also share your sdkconfig, we will try to give more tries at our end to reproduce this.

mmackh commented 3 months ago

May be some memory corruption is happening , somehow.

You were exactly right. In the library esp-nimble-cpp the struct was initialised using:

pChr_a = new ble_gatt_chr_def[numChrs + 1];

I assigned the cpfd to NULL and it's finally stable again.

Thank you so much!