arduino-libraries / ArduinoBLE

ArduinoBLE library for Arduino
GNU Lesser General Public License v2.1
303 stars 200 forks source link

Exception in HCIClass::poll #288

Open justinschoeman opened 1 year ago

justinschoeman commented 1 year ago

Hello,

I am seeing occasional exceptions as follows:

PC: 0x400d7d38: HCIClass::poll(unsigned long) at /home/justin/Arduino/libraries/ArduinoBLE/src/utility/HCI.cpp line 138
EXCVADDR: 0x0000000c

Decoding stack results
0x400d7d35: HCIClass::poll(unsigned long) at /home/justin/Arduino/libraries/ArduinoBLE/src/utility/HCI.cpp line 138
0x40116aeb: HCIClass::poll() at /home/justin/Arduino/libraries/ArduinoBLE/src/utility/HCI.cpp line 125
0x400d4ba5: ATTClass::connected(unsigned short) const at /home/justin/Arduino/libraries/ArduinoBLE/src/utility/ATT.cpp line 491
0x400d45d1: BLERemoteCharacteristic::valueUpdated() at /home/justin/Arduino/libraries/ArduinoBLE/src/remote/BLERemoteCharacteristic.cpp line 144
0x400d32bc: BLECharacteristic::valueUpdated() at /home/justin/Arduino/libraries/ArduinoBLE/src/BLECharacteristic.cpp line 328
0x400d2b32: bleUART::read(unsigned char*, int) at /home/justin/Arduino/BLE_Bat/bleUART.cpp line 134
0x400d295a: batVestwoods::run() at /home/justin/Arduino/BLE_Bat/batVestwoods.cpp line 40
0x400d1bd7: batBMSManager::run() at /home/justin/Arduino/BLE_Bat/batBMSManager.h line 23
0x400d1fe8: loop() at /home/justin/Arduino/BLE_Bat/BLE_Bat.ino line 83
0x400da771: loopTask(void*) at /home/justin/Arduino/hardware/espressif/esp32/cores/esp32/main.cpp line 50

This is line:

 while (HCITransport.available()) {

So it looks like the HCITransport vtable is being rewritten from another thread? Is this even possible, or am I looking in completely the wrong place?

EDIT: Code is at:

https://github.com/justinschoeman/BLE_Bat

if this helps.

Thanks!

trylaarsdam commented 1 year ago

I'm having this issue as well on the Portenta H7. My BLE functions/handlers area running in their own thread, but all calls to the main BLE class, peripherals, central, and characteristics are isolated with a mutex which should make BLE.poll safe as far as I know (if I'm incorrect please correct me).

Here's the mbed error log I get:

++ MbedOS Fault Handler ++

FaultType: HardFault

Context:
R0   : 2400A630
R1   : 2400A43C
R2   : 0000010D
R3   : F74FC4EE
R4   : 2400A310
R5   : 2400A630
R6   : 080A161E
R7   : 080A1646
R8   : 080A1633
R9   : 00000077
R10  : 00000000
R11  : 00000000
R12  : 08050765
SP   : 240395B8
LR   : 0804A40D
PC   : 0804A2A0
xPSR : 21000000
PSP  : 24039598
MSP  : 2407FF78
CPUID: 411FC271
HFSR : 40000000
MMFSR: 00000000
BFSR : 00000000
UFSR : 00000100
DFSR : 00000000
AFSR : 00000000
Mode : Thread
Priv : Privileged
Stack: PSP

-- MbedOS Fault Handler --

++ MbedOS Error Info ++
Error Status: 0x80FF013D Code: 317 Module: 255
Error Message: Fault exception
Location: 0x804A2A0
Error Value: 0x24019CB4
Current Thread: bleThread Id: 0x2400156C Entry: 0x8083191 StackSize: 0x1000 StackMem: 0x24038608 SP: 0x240395B8 
For more info, visit: https://mbed.com/s/error?error=0x80FF013D&osver=61600&core=0x411FC271&comp=2&ver=120200&tgt=PORTENTA_H7_M7
-- MbedOS Error Info --
Crash Info:
    Crash location = HCIClass::poll(unsigned long) [0x0804A2A0] (based on PC value)
    Caller location = HCIClass::handleAclDataPkt(unsigned char, unsigned char*) [0x0804A40D] (based on LR value)
    Stack Pointer at the time of crash = [240395B8]
    Target and Fault Info:
        Processor Arch: ARM-V7M or above
        Processor Variant: C27
        Forced exception, a fault with configurable priority has been escalated to HardFault
        Unaligned access error has occurred
trylaarsdam commented 1 year ago

Turns out (in my case) that if you have multiple threads running, even if only one thread is accessing BLE, you must have atomic flag locks around operations done in the onDataReceived ISR, otherwise the interrupt can trigger in the middle of an HCIClass::poll, and the data inside would be unsafe.

Cancel that, just got an identical crash after 2hrs with it.

E.g.

void lock()
{
    while(!core_util_atomic_flag_test_and_set(&flag));
}

void unlock()
{
    core_util_atomic_flag_clear(&flag);
}

void onDataReceived(BLEDevice device, BLECharacteristic characteristic)
{
    lock();
    uint16_t packetLength = characteristic.valueLength();

    for(int i = 0; i < packetLength; i++)
    {
        // read data
        characteristic.value()[i];
    }
    unlock();
}

As for your error it doesn't look like it's related to HCIClass::handleAclDataPkt, which might mean a different root cause. Not sure.