espressif / esp-idf

Espressif IoT Development Framework. Official development framework for Espressif SoCs.
Apache License 2.0
13.51k stars 7.26k forks source link

ASSERT crash related to NVS when performing BLE Functions (IDFGH-1521) #3785

Closed 0neblock closed 5 years ago

0neblock commented 5 years ago

Brief

ASSERT in NVS library related to saving BLE Bonds assertion "end <= ENTRY_COUNT" failed: file "components/nvs_flash/src/nvs_page.cpp", line 437, function: esp_err_t nvs::Page::copyItems(nvs::Page&)

More Information is available, please see comment.

Environment

Key Value
Development Kit Custom Board
Module or chip used ESP32-WROOM-32U
IDF version dca5ed2263134024f13f9e58f4964ad1a409834c - tracking release/v3.3
Build System Make
Compiler version crosstool-ng-1.22.0-80-g6c4433a
Operating System macOS
Power Supply external 3.3V

Problem Description

I have an application acting as a BLE Master to connect to a peripheral using the Bluedroid Library. I am encrypting the connection after connecting, using esp_ble_set_encryption(remote_bda, ESP_BLE_SEC_ENCRYPT_MITM);

I am using OOB keys to authenticate.

Most of the time, there is no issue doing this and the connection succeeds and I can continue to read characteristics and write data no problem.

However, on occasion, Just after receiving the ESP_GAP_BLE_OOB_REQ_EVT GAP event, usually after I send the esp_ble_oob_req_reply reply, the nvs_flash component which is used by esp-idf to save the bonding data, has an assertion error and crashes.

I have tried multiple workarounds, such as clearing flash or making my nvs partition larger, none of which seem to help.

This happens no matter how many devices bonding information I already have stored.

The fact that it works sometimes and not other times leads me to believe there is a race condition/corruption going on in the NVS portion of it.

Please let me know what further data I can provide.

Debug Log

assertion "end <= ENTRY_COUNT" failed: file "components/nvs_flash/src/nvs_page.cpp", line 437, function: esp_err_t nvs::Page::copyItems(nvs::Page&)
abort() was called at PC 0x400d3f27 on core 0
0x400d3f27: __assert_func at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdlib/../../../.././newlib/libc/stdlib/assert.c:63 (discriminator 8)

ELF file SHA256: 5b920c5f83d0cbf94e834415d4d41772dab53bf304a0ab01f369a4df6592b5fb

Backtrace: 0x4008cdcf:0x3ffeee10 0x4008d075:0x3ffeee30 0x400d3f27:0x3ffeee50 0x40116b8b:0x3ffeee80 0x4011768a:0x3ffeeec0 0x40115ca7:0x3ffeeee0 0x40116139:0x3ffeef60 0x4011598e:0x3ffeefd0 0x40145341:0x3ffef010 0x4012729e:0x3ffef060 0x40126629:0x3ffef080 0x4012684a:0x3ffef0b0 0x401458ca:0x3ffef0f0 0x40127412:0x3ffef1b0
0x4008cdcf: invoke_abort at components/esp32/panic.c:715

0x4008d075: abort at components/esp32/panic.c:715

0x400d3f27: __assert_func at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdlib/../../../.././newlib/libc/stdlib/assert.c:63 (discriminator 8)

0x40116b8b: nvs::Page::copyItems(nvs::Page&) at components/nvs_flash/src/nvs_page.cpp:327

0x4011768a: nvs::PageManager::requestNewPage() at components/nvs_flash/src/nvs_pagemanager.cpp:236

0x40115ca7: nvs::Storage::writeMultiPageBlob(unsigned char, char const*, void const*, unsigned int, nvs::VerOffset) at components/nvs_flash/src/nvs_storage.cpp:31

0x40116139: nvs::Storage::writeItem(unsigned char, nvs::ItemType, char const*, void const*, unsigned int) at components/nvs_flash/src/nvs_storage.cpp:31

0x4011598e: nvs_set_blob at components/nvs_flash/src/nvs_api.cpp:555

0x40145341: config_save at components/bt/bluedroid/osi/config.c:436

0x4012729e: btc_config_flush at components/bt/bluedroid/btc/core/btc_config.c:329

0x40126629: _btc_storage_save at components/bt/bluedroid/btc/core/btc_ble_storage.c:602

0x4012684a: _btc_storage_add_ble_bonding_key at components/bt/bluedroid/btc/core/btc_ble_storage.c:602
 (inlined by) btc_storage_add_ble_bonding_key at components/bt/bluedroid/btc/core/btc_ble_storage.c:144

0x401458ca: btc_dm_save_ble_bonding_keys at components/bt/bluedroid/btc/core/btc_dm.c:205
 (inlined by) btc_dm_ble_auth_cmpl_evt at components/bt/bluedroid/btc/core/btc_dm.c:273
 (inlined by) btc_dm_sec_cb_handler at components/bt/bluedroid/btc/core/btc_dm.c:685

0x40127412: btc_task at components/bt/bluedroid/btc/core/btc_task.c:110

sdkconfig

Found Here: https://gist.github.com/0neblock/67e364421b0ea287c02d0b2f8acd8150

0neblock commented 5 years ago

I Have Just had this same crash, but during the Initialisation stage of the BLE Library, So I no longer believe it is related to the bonding functions.

The Crash happened during execution of init in the following sample code:

#define CLEAR_BONDS 0
static esp_err_t initialize_ble_hardware(){
    esp_err_t ret;

    ret = nvs_flash_init();
    if (ret == ESP_ERR_NVS_NO_FREE_PAGES || ret == ESP_ERR_NVS_NEW_VERSION_FOUND) {
        nvs_flash_erase();
        ret = nvs_flash_init();
    }
#ifdef CLEAR_BONDS
#if CLEAR_BONDS == 1
    nvs_flash_erase(); 
#endif
#endif
    if(ret != ESP_OK){
        ESP_LOGE(TAG, "%s nvs init failed, error code = %x\n", __func__, ret);
        return ret;
    }

    ret = esp_bt_controller_mem_release(ESP_BT_MODE_CLASSIC_BT);
    if(ret != ESP_OK){
        ESP_LOGE(TAG, "Failed to release Classic BT Resources,: %d", ret);
        return ret;
    }

    if(!btStart()){
        ESP_LOGE(TAG, "Failed to start BLE");
        return ESP_ERR_INVALID_STATE;
    }

    ret = esp_bluedroid_init();
    if (ret) {
        ESP_LOGE(TAG, "%s init bluetooth failed, error code = %x\n", __func__, ret);
        return ret;
    }

    ret = esp_bluedroid_enable();
    if (ret) {
        ESP_LOGE(TAG, "%s enable bluetooth failed, error code = %x\n", __func__, ret);
        return ret;
    }

    return ESP_OK;
}

static esp_err_t register_ble_callbacks(){
    esp_err_t ret = esp_ble_gap_register_callback(esp_gap_cb);
    if (ret){
        ESP_LOGE(TAG, "%s gap register failed, error code = %x\n", __func__, ret);
        return ret;
    }

    ret = esp_ble_gattc_register_callback(esp_gattc_cb);
    if(ret){
        ESP_LOGE(TAG, "%s gattc register failed, error code = %x\n", __func__, ret);
        return ret;
    }

    // register GATTC app profiles
    for(int i = 0; i < TOTAL_GATTC_PROFILES; i++){
        ret = esp_ble_gattc_app_register(i);
        if (ret){
            ESP_LOGE(TAG, "%s gattc app %d register failed, error code = %x\n", __func__, i, ret);
        }
    }
    ret = esp_ble_gatt_set_local_mtu(500);
    if (ret){
        ESP_LOGE(TAG, "set local  MTU failed, error code = %x", ret);
    }

    return ESP_OK;
}

void init(){
    esp_err_t hardware_init_error = initialize_ble_hardware();
    if(hardware_init_error != ESP_OK){
        ESP_LOGE(TAG, "Could not initialise BLE module: %d", hardware_init_error);
        return;
    }
    ESP_LOGI(TAG, "BLE Hardware initialised");

    esp_err_t callback_register_err = register_ble_callbacks();
    if(callback_register_err != ESP_OK){
        ESP_LOGE(TAG, "Could not register BLE callbacks: %s", esp_err_to_name(callback_register_err));
        return;
    }
    ESP_LOGI(TAG, "BLE Callbacks initialised");

    esp_err_t security_params_err = register_default_security_params();
    if(security_params_err){
        ESP_LOGE(TAG, "Could not register BLE security: %s", esp_err_to_name(security_params_err));
        return;
    }
    ESP_LOGI(TAG, "BLE Security initialised");

    master_state = BMS_MASTER_IDLE;

    esp_bd_addr_t local_addr;
    uint8_t type;

    esp_ble_gap_get_local_used_addr(local_addr, &type);

    ESP_LOGI(TAG, "Local Address Type: %d, Address:", type);
    ESP_LOGI(TAG, ESP_BD_ADDR_STR, ESP_BD_ADDR_HEX(local_addr));
}

Debug Log

I (8729) BTDM_INIT: BT controller compile version [84af018]
I (8729) system_api: Base MAC address is not set, read default base MAC address from BLK0 of EFUSE
W (8739) phy_init: failed to load RF calibration data (0x1102), falling back to full calibration
I (8919) phy: phy_version: 4100, 2a5dd04, Jan 23 2019, 21:00:07, 0, 2
I (9109) BMS: BLE Hardware initialised
I (9119) BMS: BLE Callbacks initialised
I (9119) BMS: BLE Security initialised
assertion "end <= ENTRY_COUNT" failed: file "components/nvs_flash/src/nvs_page.cpp", line 43I (9119) BMS: File size: 20
7, function: esp_err_t nvs::Page::copyItems(nvs::Page&)
abort() was called at PC 0x400d3f3b on core 0
0x400d3f3b: __assert_func at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdlib/../../../.././newlib/libc/stdlib/assert.c:59 (discriminator 8)

ELF file SHA256: 93d73c936af3051824ecc93dc95d8b0c0320a2fd61b06d93e51006d0ca44b971

Backtrace: 0x4008cdcf:0x3ffeefb0 0x4008d075:0x3ffeefd0 0x400d3f3b:0x3ffeeff0 0x40116c03:0x3ffef020 0x40117702:0x3ffef060 0x40115d1f:0x3ffef080 0x401161b1:0x3ffef100 0x40115a06:0x3ffef170 0x401453b9:0x3ffef1b0 0x40127316:0x3ffef200 0x401266a1:0x3ffef220 0x401269e1:0x3ffef250 0x40145b34:0x3ffef270 0x4012748a:0x3ffef330
0x4008cdcf: invoke_abort at components/esp32/panic.c:715

0x4008d075: abort at components/esp32/panic.c:715

0x400d3f3b: __assert_func at /Users/ivan/e/newlib_xtensa-2.2.0-bin/newlib_xtensa-2.2.0/xtensa-esp32-elf/newlib/libc/stdlib/../../../.././newlib/libc/stdlib/assert.c:59 (discriminator 8)

0x40116c03: nvs::Page::copyItems(nvs::Page&) at components/nvs_flash/src/nvs_page.cpp:327

0x40117702: nvs::PageManager::requestNewPage() at components/nvs_flash/src/nvs_pagemanager.cpp:236

0x40115d1f: nvs::Storage::writeMultiPageBlob(unsigned char, char const*, void const*, unsigned int, nvs::VerOffset) at components/nvs_flash/src/nvs_storage.cpp:31

0x401161b1: nvs::Storage::writeItem(unsigned char, nvs::ItemType, char const*, void const*, unsigned int) at components/nvs_flash/src/nvs_storage.cpp:31

0x40115a06: intrusive_list_node<HandleEntry>::intrusive_list_node() at components/nvs_flash/src/nvs_api.cpp:555
 (inlined by) HandleEntry::HandleEntry() at components/nvs_flash/src/nvs_api.cpp:48
 (inlined by) nvs_set_blob at components/nvs_flash/src/nvs_api.cpp:426

0x401453b9: config_save at components/bt/bluedroid/osi/config.c:435

0x40127316: btc_config_flush at components/bt/bluedroid/btc/core/btc_config.c:329

0x401266a1: _btc_storage_save at components/bt/bluedroid/btc/core/btc_ble_storage.c:602

0x401269e1: _btc_storage_add_ble_local_key at components/bt/bluedroid/btc/core/btc_ble_storage.c:602
 (inlined by) btc_storage_add_ble_local_key at components/bt/bluedroid/btc/core/btc_ble_storage.c:308

0x40145b34: btc_dm_sec_cb_handler at components/bt/bluedroid/btc/core/btc_dm.c:784

0x4012748a: btc_task at components/bt/bluedroid/btc/core/btc_task.c:107
igrr commented 5 years ago

Can you please dump the contents of NVS partition and attach it to this issue? parttool.py can be used to read a partition.

https://github.com/espressif/esp-idf/blob/fdab15dc76a3464ed10ce03e9b9cf1a5cd2aa0d1/examples/storage/parttool/parttool_example.sh#L54

0neblock commented 5 years ago

Hi @igrr Thanks for the reply.

I haven't received a crash for a couple of days, I have a feeling it might be related to these lines:

ret = nvs_flash_init();
if (ret == ESP_ERR_NVS_NO_FREE_PAGES || ret == ESP_ERR_NVS_NEW_VERSION_FOUND) {
    nvs_flash_erase();
    ret = nvs_flash_init();
}

#ifdef CLEAR_BONDS
#if CLEAR_BONDS == 1
    nvs_flash_erase(); 
#endif
#endif

I had #define CLEAR_BONDS = 1, which caused nvs_flash_erase(); to be after nvs_flash_init(); Could this have been the cause of the ASSERT?

igrr commented 5 years ago

Yes, calling erase after init would most definitely be an issue.

0neblock commented 5 years ago

Thanks, this seems to have solved it