arduino-libraries / ArduinoBLE

ArduinoBLE library for Arduino
GNU Lesser General Public License v2.1
303 stars 200 forks source link

ArduinoBLE fail to restart after BLE.end #158

Open paulvha opened 3 years ago

paulvha commented 3 years ago

If you have a solution that communicates with ArduinoBLE using Cordio, and you want to be able to set the solution to sleep for some time to save battery, ArduinoBLE will crash after sleep.

Root cause

Before going to sleep you have to call BLE.end() as the connection will disconnect when the lower level is set to deep_sleep and you need to reconnect on return. In HCICordioTransport.cpp HCI.end(), called from BLE.end(), will perform a bleLoopThread.terminate():

https://github.com/arduino-libraries/ArduinoBLE/blob/3b228dc90262751bdac97712a6c7bda95fcf3a5c/src/utility/HCICordioTransport.cpp#L200

This Bleloop is handling the wsf-timing. After sleep BLE.begin() is calling HCI.begin(), where it tries to (re)start bleLoopThread.start(bleLoop):

https://github.com/arduino-libraries/ArduinoBLE/blob/3b228dc90262751bdac97712a6c7bda95fcf3a5c/src/utility/HCICordioTransport.cpp#L189

BUT.... AS THIS WAS TERMINATED YOU CAN NOT RESTART. Search on the Internet and you will find more people struggling with similar aspects.

Solution

Do not terminate the thread, but put it to wait. Make the following changes in HCICordioTransport.cpp

  1. Add a global variable around line 70:

    volatile bool GoSleep = false;
  2. In bleloop() change: https://github.com/arduino-libraries/ArduinoBLE/blob/98ff550988912ffbaeb1d877970e9e05f1de0599/src/utility/HCICordioTransport.cpp#L123-L125 To:

    while (true) {
        if (GoSleep) {
            timer.stop();
            // Serial.print("SLEEP\n");
            rtos::ThisThread::flags_wait_any(0x02); // wait for signal
            // Serial.print("WAKE\n");
            timer.start();
        }
    
        last_update_us += (uint64_t) timer.read_high_resolution_us();
        timer.reset();
  3. In HCICordioTransportClass::begin() change: https://github.com/arduino-libraries/ArduinoBLE/blob/3b228dc90262751bdac97712a6c7bda95fcf3a5c/src/utility/HCICordioTransport.cpp#L115-L117 To:

    if (GoSleep) {
       bleLoopThread.flags_set(0x02);  // restart scheduler
       GoSleep = false;
    }
    else
     bleLoopThread.start(bleLoop);     // start WSF scheduler
  4. in HCICordioTransportClass::end() change https://github.com/arduino-libraries/ArduinoBLE/blob/3b228dc90262751bdac97712a6c7bda95fcf3a5c/src/utility/HCICordioTransport.cpp#L200 TO:

    //bleLoopThread.terminate();      // once terminated you can NOT restart
    GoSleep = true;                   // set WSF scheduler to sleep

Additional context

Additional reports

tgruetzm commented 1 year ago

Hey @paulvha, do you have any example repos of this working? I've tried implementing this and it works for about 25 cycles and that hard faults. I'm struggling to figure it out, but wondering if you still have a working solution?

paulvha commented 1 year ago

Yes... I know this issue (and the solution to apply) as it took 3 long days of hard debugging in the past weeks to figure it out.

The problem is actually NOT ArduinoBLE, but the root cause is in the Mbed / Apollo3 / BLE driver implementation. The Apollo3 is following a standard definition used by other BLE targets.

What happens?

Each time an ArduinoBLE.begin() is performed, a call is done from the HCICordioTransport.begin() to getDriver().initialize(). The latter is part of Mbed / Cordio. Here it will perform a number of initialization routines for the drivers and one of them is to connect WSF (Wireless Software Foundation) API with the Apollo3 BLE driver to have a synchronized / correctly ordered workflow between the different BLE driver routines. To make that connection a unique handle is requested and initialized with call-back routines. The problem is with that handle as there is only a limited amount defined to be available. The amount of handles is different between MBED releases. I have seen 10 or 16, defined as WSF_MAX_HANDLERS.

When calling BLE.end() from HCICordioTransport.end() the getDriver().terminate() is called. That will trigger a number of driver de-initialization routines to be called, BUT... there is NO WSF-API call to free / return / release the earlier provided handle. (!!).

Next time a BLE.start() is called that NEW / NEXT handle is requested and used. Once the defined number of handles runs out it "should" create an error message, but given the code structure, it does not trigger / disabled. The Apollo3 driver will assign a call-back routine to a memory location that is not assigned for handles and at a certain moment it will overwrite critical memory thus causing a hardware fault. The 25 cycles are what I have observed before impacting critical memory.

The solution

The solution is a change in the Apollo3 driver to ONLY ask for a new handle if it does not have one. The one it had requested first time around is retained during deep sleep if you retain the full 384 MB memory. (which you always should on V2.x.x. of Apollo3)

Once I figured it out it was not a difficult change and I applied it to a number of Apollo3 variants myself and tested it works. file:

2.2.1/cores/mbed-os/features/FEATURE_BLE/targets/TARGET_Ambiq_Micro/TARGET_Apollo3/AP3CordioHCITransportDriver.cpp

changed to:

  // if we start & end BLE more than 16 times (WSF_MAX_HANDLERS)
  // we can not start BLE again. There is NO call to undo/free
  // the handler as we terminate.
  wsfHandlerId_t handlerId = (wsfHandlerId_t) 0;

  void AP3CordioHCITransportDriver::initialize()
  {
      if (!handlerId)
        handlerId = WsfOsSetNextHandler(HciDrvHandler);
      }

      HciDrvHandlerInit(handlerId);
}

The only challenge is that you need to know how to create a new MBED pre-compiled library and how to use it. If you do not know that drop me a mail at paulvha@hotmail.com with the board you use. i will send you the pre-compiled library and installation instructions for the Apollo3.