espressif / esp-nimble

A fork of NimBLE stack, for use with ESP32 and ESP-IDF
Apache License 2.0
76 stars 49 forks source link

ble_gattc.c crash dependent on client device #53

Closed ArdenKolodner closed 4 months ago

ArdenKolodner commented 1 year ago

As detailed in this issue, I have an ESP32S3 running NimBLE using the esp-nimble-cpp interface. When subscribing to notifications on a characteristic using my laptop (M1 MacBook Pro), the connection works without issue. When subscribing to notifications via the same app (LightBlue) on the same characteristic, using an iPhone (iOS 16.1), the ESP32 device crashes.

I've traced the error to ble_gattc.c in this library, line 4169, where ble_att_svr_read_handle is called without a value for the connection handle.

When replacing BLE_HS_CONN_HANDLE_NONE with conn_handle, the code works fine. However, this is an ad hoc solution, and I'd like to find a fix that doesn't involve editing an obscure file every time we update the library or install ESP-IDF on a new computer. Also, I can't guarantee that this doesn't cause problems somewhere down the line, but maybe the developers can confirm that this is safe?

Alternatively, perhaps this is an issue with the library that should be changed for everyone, as the author of esp-nimble-cpp suggested ("remove the assert in the BLE_GATT_ACCESS_OP_READ_CHR event handler and gracefully return instead" -h2zero).

Please see the issue linked above for specific context on the crash and how I traced the problem.

SumeetSingh19 commented 1 year ago

Hi @ArdenKolodner , I looked at your other issue, to debug this further, could you please answer these questions:

  1. Are you calling ble_gatts_notify() after subscribe? (I'm not talking about ble_gatts_notify_custom())
  2. In the access callback of the attribute that you are notifying on, have you put any asserts or if conditions on the conn_handle?

If you could also provide your application, it would be helpful in getting to the root of the issue.

ArdenKolodner commented 1 year ago

Hi @SumeetSingh19, thank you for your response. Unfortunately I cannot share the application code, as it is proprietary. However, I will do my best to answer what I can.

We are using esp-nimble-cpp, which does not use ble_gattc_notify, it only uses ble_gattc_notify_custom.

While we don't put any asserts or if-statements on conn_handle, it looks like esp-nimble-cpp has something similar. The following code (NimBLECharacteristic.cpp line 277 and 314): rc = ble_gap_conn_find(conn_handle, &desc); assert(rc == 0); runs inside its GAP event handler, whenever there is a read or write event on a characteristic. The assert isn't on conn_handle itself, but on the return value of ble_gap_conn_find, so if ble_gap_conn_find fails, the assert would fail. However, this assert doesn't ever seem to fail; the problem seems to occur in ble_gattc_notify_custom, when it calls ble_att_svr_read_handle(BLE_HS_CONN_HANDLE_NONE, [...]). My apologies, the error does happen within that assert statement. The call to ble_gap_conn_find is what is failing.

SumeetSingh19 commented 1 year ago

Hi @ArdenKolodner , The reason ble_att_svr_read_handle, as you mentioned, is called with BLE_HS_CONN_HANDLE_NONE, is because the server itself is performing a read operation on the attribute so that it can fill the notification value using the attribute value. This happens when ble_gattc_notify_custom is called without a value (3rd argument is NULL).

So if the remote did not request the read operation, the server cannot use the remote's connection handle to read the attribute's value, and hence the server uses BLE_HS_CONN_HANDLE_NONE.

This also explains why in the event handler for the read operation, the assert is failing: Because the handler receives a read request with BLE_HS_CONN_HANDLE_NONE as the connection handle and when it tries to find the connection descriptor using BLE_HS_CONN_HANDLE_NONE, naturally it will not get one. Note that this read operation is not supposed to be ignored.

What you should do is check if you are notifying without specifying a value. If you need any help in checking that let me know.

If you did indeed notify without a value, let me know and I can help fix the problem.

ArdenKolodner commented 1 year ago

Hi @SumeetSingh19, I apologize for the slow response but I have not had the chance to investigate for a while.

My code uses the esp-nimble-cpp library, and you can see its implementation of notifications here. Search for "void NimBLECharacteristic::notify(const uint8_t* value, size_t length, bool is_notification, uint16_t conn_handle) {" to see the implementation that I am calling. As I understand it, the value and length are copied into a memory buffer (os_mbuf), which is provided to the ble_gattc_notify_custom call, so there should be a value specified. But if you meant something different, then yes, please let me know how to check.

italocjs commented 4 months ago

Hi, did you ever managed to fix this? i'm having a similar issue:

assert failed: static int NimBLECharacteristic::handleGapEvent(uint16_t, uint16_t, ble_gatt_access_ctxt*, void*) NimBLECharacteristic.cpp:277 (rc == 0)

Backtrace: 0x40083a11:0x3ffe21b0 0x400960cd:0x3ffe21d0 0x4009bf75:0x3ffe21f0 0x401ab53c:0x3ffe2320 0x401a1f89:0x3ffe23a0 0x401a201d:0x3ffe23d0 0x4019df13:0x3ffe2410 0x4019e1ae:0x3ffe2450 0x401a1859:0x3ffe2470 0x401ab431:0x3ffe2490 0x401adba1:0x3ffe2520 0x401880a5:0x3ffe2540

  #0  0x40083a11:0x3ffe21b0 in panic_abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_system/panic.c:408
  #1  0x400960cd:0x3ffe21d0 in esp_system_abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_system/esp_system.c:137
  #2  0x4009bf75:0x3ffe21f0 in __assert_func at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/newlib/assert.c:85
  #3  0x401ab53c:0x3ffe2320 in NimBLECharacteristic::handleGapEvent(unsigned short, unsigned short, ble_gatt_access_ctxt*, void*) at lib/NimBLE-Arduino/src/NimBLECharacteristic.cpp:315 (discriminator 1)
  #4  0x401a1f89:0x3ffe23a0 in ble_gatts_val_access at lib/NimBLE-Arduino/src/nimble/nimble/host/src/ble_gatts.c:378
      (inlined by) ble_gatts_val_access at lib/NimBLE-Arduino/src/nimble/nimble/host/src/ble_gatts.c:343
  #5  0x401a201d:0x3ffe23d0 in ble_gatts_chr_val_access at lib/NimBLE-Arduino/src/nimble/nimble/host/src/ble_gatts.c:424
  #6  0x4019df13:0x3ffe2410 in ble_att_svr_read at lib/NimBLE-Arduino/src/nimble/nimble/host/src/ble_att_svr.c:402
  #7  0x4019e1ae:0x3ffe2450 in ble_att_svr_read_handle at lib/NimBLE-Arduino/src/nimble/nimble/host/src/ble_att_svr.c:477
  #8  0x401a1859:0x3ffe2470 in ble_gattc_notify_custom at lib/NimBLE-Arduino/src/nimble/nimble/host/src/ble_gattc.c:4174
  #9  0x401ab431:0x3ffe2490 in NimBLECharacteristic::notify(unsigned char const*, unsigned int, bool) at lib/NimBLE-Arduino/src/NimBLECharacteristic.cpp:511       
  #10 0x401adba1:0x3ffe2520 in NordicUARTService::send(char const*, bool) at lib/nimble_source/src/NuS.cpp:165 (discriminator 4)
  #11 0x401880a5:0x3ffe2540 in NimBLE_txTask(void*) at lib/nimble_nus/nimble_nus.cpp:299

ELF file SHA256: a06925091e97d397

Rebooting...
ets Jul 29 2019 12:21:46

rst:0xc (SW_CPU_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:1184
load:0x40078000,len:13232
load:0x40080400,len:3028
entry 0x400805e4

this is line 299
        if (NuPacket.send(packageData, false) == 0)
                {
                    ESP_LOGW("NimBLE TX", "Failed to send package data or no peer connected");
                }

Diving deeper, 

User

/**
 * @brief STATIC callback to handle events from the NimBLE stack.
 */
int NimBLECharacteristic::handleGapEvent(uint16_t conn_handle, uint16_t attr_handle,
                             struct ble_gatt_access_ctxt *ctxt,
                             void *arg)
{
    const ble_uuid_t *uuid;
    int rc;
    struct ble_gap_conn_desc desc;
    NimBLECharacteristic* pCharacteristic = (NimBLECharacteristic*)arg;

    NIMBLE_LOGD(LOG_TAG, "Characteristic %s %s event", pCharacteristic->getUUID().toString().c_str(),
                                    ctxt->op == BLE_GATT_ACCESS_OP_READ_CHR ? "Read" : "Write");

    uuid = ctxt->chr->uuid;
    if(ble_uuid_cmp(uuid, &pCharacteristic->getUUID().getNative()->u) == 0){
        switch(ctxt->op) {
            case BLE_GATT_ACCESS_OP_READ_CHR: {
                rc = ble_gap_conn_find(conn_handle, &desc);
                assert(rc == 0);

                 // If the packet header is only 8 bytes this is a follow up of a long read
                 // so we don't want to call the onRead() callback again.
                if(ctxt->om->om_pkthdr_len > 8 ||
                   pCharacteristic->m_value.size() <= (ble_att_mtu(desc.conn_handle) - 3)) {
                    pCharacteristic->m_pCallbacks->onRead(pCharacteristic);
                    pCharacteristic->m_pCallbacks->onRead(pCharacteristic, &desc);
                }

                ble_npl_hw_enter_critical();
                rc = os_mbuf_append(ctxt->om, pCharacteristic->m_value.data(), pCharacteristic->m_value.size());
                ble_npl_hw_exit_critical(0);
                return rc == 0 ? 0 : BLE_ATT_ERR_INSUFFICIENT_RES;
            }

            case BLE_GATT_ACCESS_OP_WRITE_CHR: {
                uint16_t att_max_len = pCharacteristic->m_value.max_size();

                if (ctxt->om->om_len > att_max_len) {
                    return BLE_ATT_ERR_INVALID_ATTR_VALUE_LEN;
                }

                uint8_t buf[att_max_len];
                size_t len = ctxt->om->om_len;
                memcpy(buf, ctxt->om->om_data,len);

                os_mbuf *next;
                next = SLIST_NEXT(ctxt->om, om_next);
                while(next != NULL){
                    if((len + next->om_len) > att_max_len) {
                        return BLE_ATT_ERR_INVALID_ATTR_VALUE_LEN;
                    }
                    memcpy(&buf[len], next->om_data, next->om_len);
                    len += next->om_len;
                    next = SLIST_NEXT(next, om_next);
                }
                rc = ble_gap_conn_find(conn_handle, &desc);
                assert(rc == 0);
                pCharacteristic->setValue(buf, len);
                pCharacteristic->m_pCallbacks->onWrite(pCharacteristic);
                pCharacteristic->m_pCallbacks->onWrite(pCharacteristic, &desc);
                return 0;
            }
            default:
                break;
        }
    }

    return BLE_ATT_ERR_UNLIKELY;
}

line 277 is                 assert(rc == 0);
Singh-Sumeet commented 4 months ago

Hi @ArdenKolodner and @italocjs , if you are using esp-nimble-cpp, inside NimBLECharacteristic.cpp on line 277 and 314, there is this line: rc = ble_gap_conn_find(conn_handle, &desc); assert(rc == 0);

This practice can be problematic, because NimBLECharacteristic::handleGapEvent may get events with connection handle equal to BLE_HS_CONN_HANDLE_NONE. Note that this is expected behaviour, it simply means that the server performed a read/write operation on itself.

A more robust way to write this code would be either: if(conn_handle != BLE_HS_CONN_HANDLE_NONE) { rc = ble_gap_conn_find(conn_handle, &desc); assert(rc == 0); } OR rc = ble_gap_conn_find(conn_handle, &desc); assert((conn_handle != BLE_HS_CONN_HANDLE_NONE) && (rc == 0));

Also, note that struct ble_gap_conn_desc desc; will stay garbage if BLE_HS_CONN_HANDLE_NONE is received. My suggestion is to replace it with NULL for that case.

If you want a patch for this, please give me the commit ID you are using, because I can see that the code has changed since last time we discussed this issue.

italocjs commented 4 months ago

Hi @ArdenKolodner and @italocjs , if you are using esp-nimble-cpp, inside NimBLECharacteristic.cpp on line 277 and 314, there is this line: rc = ble_gap_conn_find(conn_handle, &desc); assert(rc == 0);

This practice can be problematic, because NimBLECharacteristic::handleGapEvent may get events with connection handle equal to BLE_HS_CONN_HANDLE_NONE. Note that this is expected behaviour, it simply means that the server performed a read/write operation on itself.

A more robust way to write this code would be either: if(conn_handle != BLE_HS_CONN_HANDLE_NONE) { rc = ble_gap_conn_find(conn_handle, &desc); assert(rc == 0); } OR rc = ble_gap_conn_find(conn_handle, &desc); assert((conn_handle != BLE_HS_CONN_HANDLE_NONE) && (rc == 0));

Also, note that struct ble_gap_conn_desc desc; will stay garbage if BLE_HS_CONN_HANDLE_NONE is received. My suggestion is to replace it with NULL for that case.

If you want a patch for this, please give me the commit ID you are using, because I can see that the code has changed since last time we discussed this issue.

Thanks for the tip, i've implemented it here and seems to be working fine now!, i will test with more data and more times but already seems more stable, i'll make sure to report back here.

            case BLE_GATT_ACCESS_OP_READ_CHR: {

                if(conn_handle != BLE_HS_CONN_HANDLE_NONE) 
                {
                rc = ble_gap_conn_find(conn_handle, &desc);
                assert(rc == 0);
                }
                // rc = ble_gap_conn_find(conn_handle, &desc);
                // assert(rc == 0);

                 // If the packet header is only 8 bytes this is a follow up of a long read
                 // so we don't want to call the onRead() callback again.
                if(ctxt->om->om_pkthdr_len > 8 ||
                   pCharacteristic->m_value.size() <= (ble_att_mtu(desc.conn_handle) - 3)) {
                    pCharacteristic->m_pCallbacks->onRead(pCharacteristic);
                    pCharacteristic->m_pCallbacks->onRead(pCharacteristic, &desc);
                }

                ble_npl_hw_enter_critical();
                rc = os_mbuf_append(ctxt->om, pCharacteristic->m_value.data(), pCharacteristic->m_value.size());
                ble_npl_hw_exit_critical(0);
                return rc == 0 ? 0 : BLE_ATT_ERR_INSUFFICIENT_RES;
            }

A curious thing, before this change i had to use a 10ms delay within my code to avoid crashing because of this assert (but did not stop it), now even without any delay in my code is not crashing. (for context, i have a task that read a large log file in sdcard, packetize it based on the max_mtu then send it over nimble/nus).

ArdenKolodner commented 4 months ago

Thank you both! I'm no longer at the position that this project was for, and we never ended up using NimBLE due to another issue, but this is great to hear. I'll pass it along to the others. I'll leave the issue open until they can confirm if the solution works, or if they decide not to pursue it, I'll close it since the issue was solved for Italo.

ArdenKolodner commented 4 months ago

I can now confirm they've decided not to pursue it at this time. Thank you both once again!