ARMmbed / mbed-hal-nrf51822-mcu

mbed HAL port for nRF51822
9 stars 12 forks source link

Minar doesn't sleep correctly - draws 4 mA #46

Closed Timmmm closed 8 years ago

Timmmm commented 8 years ago

If I have this simple program:

void app_start(int argc, char *argv[])
{
}

Then when it runs it draws about 4 mA. However if I hack it like this:

void app_start(int argc, char *argv[])
{
    for (;;)
        __WFE();
}

Then the current draw goes way down to about 4 uA, which is what I'd expect.

I'm not using the softdevice (though it is merged into my hex), and this is with an nRF51822 without a 32 kHz crystal. I had a skim through the minar code but couldn't see anything obvious.

ciarmcom commented 8 years ago

ARM Internal Ref: IOTSFW-2007

0xc0170 commented 8 years ago

@Timmmm Please share yotta ls. It should not be 4mA. I recall there was an issue with serial which we did not address properly, which caused that 4mA, but was fixed in the mbed-drivers 0.12.x (here's an issue reported: https://github.com/ARMmbed/mbed-drivers/issues/140 )

Timmmm commented 8 years ago

Here is yotta ls:

button 0.0.0
\_ mbed-drivers 0.12.1
  |_ mbed-hal 1.2.2 yotta_modules\mbed-hal
  | \_ mbed-hal-nordic 2.0.0 yotta_modules\mbed-hal-nordic
  |   \_ mbed-hal-nrf51822-mcu 2.1.5 yotta_modules\mbed-hal-nrf51822-mcu
  |     |_ nrf51-sdk 2.2.1 yotta_modules\nrf51-sdk
  |     \_ mbed-hal-mkit 1.0.2 yotta_modules\mbed-hal-mkit
  |_ cmsis-core 1.1.2 yotta_modules\cmsis-core
  | \_ cmsis-core-nordic 1.0.1 yotta_modules\cmsis-core-nordic
  |   \_ cmsis-core-nrf51822 1.3.2 yotta_modules\cmsis-core-nrf51822
  |_ ualloc 1.0.3 yotta_modules\ualloc
  | \_ dlmalloc 1.0.0 yotta_modules\dlmalloc
  |_ minar 1.0.4 yotta_modules\minar
  | \_ minar-platform 1.0.0 yotta_modules\minar-platform
  |   \_ minar-platform-mbed 1.1.1 yotta_modules\minar-platform-mbed
  |_ core-util 1.2.0 yotta_modules\core-util
  \_ compiler-polyfill 1.2.1 yotta_modules\compiler-polyfill

Looks like it is a separate issue.

0xc0170 commented 8 years ago

cc @pan- @marcuschangarm

jaustin commented 8 years ago

@Timmmm I know @pan- is looking into this to fix the actual bug, but in the meantime, I've found that as long as you've got SoftDevice linked in, you can use ble.init(); to achieve acceptable power consumption when sleeping:

yt install ble

Then the following has sensible sleep behaviour from a power consumption PoV:

#include "mbed-drivers/mbed.h"
#include "ble/ble.h"

static void blinky(void) {
    static DigitalOut led(p23);
    led = !led;
}

void app_start(int, char**){
    BLE &ble = BLE::Instance();
    ble.init(); // normally you'd want to pass a callback for when init completes
    minar::Scheduler::postCallback(blinky).period(minar::milliseconds(500));
}

This is slightly more flexible than the workaround you've shown because you can let minar sleep whenever it wants and it'll still do the 'right thing'.

Timmmm commented 8 years ago

Ah that is a nicer solution. I'll give it a go.

emidttun commented 8 years ago

Regarding the suggested power saving code:

void app_start(int argc, char *argv[])
{
    for (;;)
        __WFE();
}

This is will for sure make the Softdevice hardfault, so with the softdevice it will not work. The correct way is to call sd_app_evt_wait(). I think this is what happens in solution suggested by @jaustin.

pan- commented 8 years ago

@emidttun @Timmmm does not use use the soft device, in this case, it should be ok to call WFE directly.

When minar does not have anything to do, it go to sleep: https://github.com/ARMmbed/mbed-hal-nrf51822-mcu/blob/master/source/sleep.c

As you can see, there is two possible branch for the sleep process:

Everything looks good in this code, except that if sd_softdevice_is_enabled is called, somehow events will be generated and WFE will "return" instantaneously. This means that if sd_softdevice_is_enabled is called, the device will never go to sleep.

The workaround proposed by @jaustin will work but it cost a lot of RAM since it will activate the soft device.

Another workaround can be:

class CriticalSectionLock {
public:
    CriticalSectionLock() {
        // get the state of exceptions for the CPU
        _PRIMASK_state = __get_PRIMASK();

        // if exceptions are not enabled, there is nothing more to do
        if(_PRIMASK_state == 1) {
            return;
        } else {
            __disable_irq();
        }
    }

    ~CriticalSectionLock() {
        __set_PRIMASK(_PRIMASK_state);
    }

private:
  uint32_t _PRIMASK_state;
};
Timmmm commented 8 years ago

@pan Should I be worried that CriticalSectionLock relies on the softdevice? Where is it used?

If sd_softdevice_is_enabled is buggy, and no-one here has access to the source, perhaps a simple and ugly hack would be to reimplement it at the mBed level, like this:

class nRF5xn 
{
    // ...
    static bool softdevice_used = false;
    // ...
};

ble_error_t nRF5xn::init(BLE::InstanceID_t instanceID, FunctionPointerWithContext<BLE::InitializationCompleteCallbackContext *> callback)
{
    // ...
    if (btle_init() != ERROR_NONE) {
        return BLE_ERROR_INTERNAL_STACK_FAILURE;
    }
    softdevice_used = true;
    // ...

// In sleep.c:
if (nRF5xn::_initialised && (__get_PRIMASK() == 0) && (sd_softdevice_is_enabled(&sd_enabled) == NRF_SUCCESS) && (sd_enabled == 1)) {
    // soft device is enabled, use the primitives from the soft device to go to sleep
    sd_app_evt_wait();

There is actually already a nRF5Xn::initialised but it isn't static.

jaustin commented 8 years ago

I wonder if the safe thing for us to do is to always initialise SoftDevice when we have an nRF51 target with SoftDevice linked in, and provide a target or configuration option to build without SD at all. In the case that there's no SD, we use conditional compilation to either stub out the calls into SD, or switch in implementations that don't call those functions (like your implementation above, @pan- )

@Timmmm can you elaborate a little on what you're doing? Would it be better for you to be able to build without SoftDevice at all? Or do you want to be able to init SoftDevice sometime later in your code?

pan- commented 8 years ago

@Timmmm

CriticalSectionLock usage and what it matters:

CriticalSectionLock is used by minar, the scheduler. The scheduler is basically an infinite loop which try to get a task and execute it at each iteration. If there is no task available, it goes to sleep.

Internally, the scheduler has a "list" of pending tasks, user can push task into this list by using the function postCallback. The Scheduler will pop one of those task at each iteration.

Because tasks can be posted into the scheduler from interrupt contexts, this list of tasks has to be protected against concurrent access, otherwise, there is a risk to corrupt this structure. So, to protect this data structure, the scheduler acquire a lock before it access (push or pop) its list of pending tasks. That's where CriticalSectionLock is used.

While modifying the CriticalSectionLock class will not solve the power consumption issue, it will help if you do not use the softdevice. Indeed, if the soft device is not enabled, the CriticalSectionLock does not work this can lead to corruptions in the scheduler internal data structure and a lot of strange behavior.

About re-implementation of sd_softdevice_is_enabled at mbed level:

This is a dirty hack and it is not reliable at all, if one of our device want to use directly the softdevice instead of BLE API while using other parts of mbed, it will not work well. We will work with Nordic to have a better understanding of the issue and hopefully find a long term solution.

Timmmm commented 8 years ago

@jaustin In this case I'm not using the soft-device at all (I'm doing custom radio stuff), so an option to have it not linked would be nice. I don't think that should be the solution though. Ideally it would be an option but this bug should be fixed too.

@pan- Ah I see. Does your modification of CriticalSectionLock work with the soft device enabled, as well as disabled? If so it should be used by everyone right? Good point about using the softdevice directly.

emidttun commented 8 years ago

I now understand the behaviour of the current sleep implementation and why it fails:

    // look if exceptions are enabled or not, if they are, it is possible to make an SVC call
    // and check if the soft device is running
    if ((__get_PRIMASK() == 0) 
        && (sd_softdevice_is_enabled(&sd_enabled) == NRF_SUCCESS) 
        && (sd_enabled == 1)) 
    {
        // soft device is enabled, use the primitives from the soft device to go to sleep
        sd_app_evt_wait();
    } else {
        // impossible to use soft device primitive, just wait for events
        __WFE();
    }

The reason starts when sd_softdevice_is_enabled is called. This function (and all other SoftDevice API functions) are executed as SVC calls, running in interrupt context. When the function call has returned, it actually returns from an interrupt and the WFE event flag is set. When entering the else clause the __WFE does not put the CPU to sleep, but simply clears the event flag.

A possible and simple solution is as follows:

    } else {
        __SEV(); // Set an event, in case __get_PRIMASK() != 0
        __WFE(); // Clear event flag
        __WFE(); // Sleep
    }

However, there is a problem if another "wake-up" source's event flag is cleared. Will that be an issue in mbed OS?

Timmmm commented 8 years ago

If I follow you correctly, __WFE() waits for an interrupt to be called, but sd_softdevice_is_enabled() executes as an interrupt, so sd_softdevice_is_enabled() returns false, but then __WFE() always returns immediately.

And the issue with your fix is if another event happens, before getting to the second __WFE() then we won't detect it. If so what about something like this:

__WFE();
if ((__get_PRIMASK() == 0) 
    && (sd_softdevice_is_enabled(&sd_enabled) == NRF_SUCCESS) 
    && (sd_enabled == 1)) 
{
    // Let the softdevice handle the event too.
    __SEV(); // I guess this line is not strictly necessary?
    sd_app_evt_wait();
} 

I don't really know enough about Cortex M exceptions and interrupts to know if this is right.

pan- commented 8 years ago

@Timmmm : I made two PR which fix those issue:

I hope it will be reviewed and accepted soon. This is a long term fix, not a short term workaround.