ARMmbed / mbed-os

Arm Mbed OS is a platform operating system designed for the internet of things
https://mbed.com
Other
4.67k stars 2.98k forks source link

Ticker issues with NRF52 platform, especially while using event queue call_every timer function. #4893

Closed yogeshk19 closed 6 years ago

yogeshk19 commented 7 years ago

Description

Toolchain: GCC_ARM

MBED_OS version 5.4.4, 5.5

Expected behavior Event Queue's call_every method should invoke the call back method based on the interval passed to the call_every API. for ex: The application I have written expects to read a sensor every 30 seconds and transmit data to any Bluetooth Central device connected to it. Actual behavior

In mbed os 5.4.4 the call back method takes forever to be invoked and it is pretty random. Sometimes its takes several minutes and sometimes it takes a few seconds.

In mbed os 5.5 the application just gets into stack underflow error.

However in mbed os 5.3.3, the call_every function works as expected. There was fix made in the issue #3857 and that was incorporated part of the mbed-os 5.4.4 which was supposed to address the ticker issue, however it seems to have made it worse.

Steps to reproduce

1) Take any of the example MBED BLE applications that is a BLE peripheral application and add the following code to ensure the device operates in low power mode and modify the application such that a sensor is read via the event queue call_every method.

#include <limits.h>
#include <mbed.h>

// import the time duration between two ticks (in us).
extern const uint32_t os_clockrate;

//initialize an event queue loop.
static EventQueue eventQueue(16*32);

void dummy_cb() { }

void os_idle_demon (void) {
    // use int rather than timestamp_t because units are not coherent 
    // between Timer and Timeout ...
    const int max_us_sleep = (INT_MAX / os_clockrate) * os_clockrate; 
    Timer stopwatch;      // keep track of the time asleep
    Timeout alarm_clock;  // will awake the uc if no interrupts does it before

    // never ends, the rtos will suspend this thread when there is something to do
    // either before os_suspend actually suspend the system (and is not in svc) 
    // or immediately after os_resume  
    while (true) {
        // suspend the system 
        uint32_t ticks_to_sleep = os_suspend();
        uint32_t elapsed_ticks = 0;

        if (ticks_to_sleep) { 
            uint64_t us_to_sleep = ticks_to_sleep * os_clockrate; 

            if (us_to_sleep > (uint32_t) max_us_sleep) { 
                us_to_sleep = max_us_sleep;
            }

            // start the stopwatch and setup the alarm_clock to wakeup the uc in us_to_sleep
            stopwatch.start();
            alarm_clock.attach_us(dummy_cb, us_to_sleep);

            // go to sleep, most of the work is done by the softdevice 
            sleep();

            // after sleep, unknown wake up source, can be the stopwatch or another IRQ
            int us_asleep = stopwatch.read_us();

            // stopwatch and alarm_clock cleanup
            stopwatch.stop();
            stopwatch.reset();
            alarm_clock.detach();

            // translate us asleep into ticks 
            elapsed_ticks = us_asleep / os_clockrate;
        }

        // resume the system 
        os_resume(elapsed_ticks);
    }
}

Please let me know if I can add any more details, that would help us get un-blocked.

Thanks, Yogesh

0xc0170 commented 7 years ago

cc @pan- @anangl @nvlsianpu

0xc0170 commented 7 years ago

In mbed os 5.5 the application just gets into stack underflow error.

What is the specific version ? Latest release 5.5.4 has also this problem? Or latest master? Which thread is getting this error, what is stack size there? have you tried to increase the size ?

Some observation:

yogeshk19 commented 7 years ago

Thanks Martin for getting back to this thread.

The MBED OS version is 5.5.2. I have not tried the latest version of the officially released MBED OS version yet. The thread that is getting this error, is the thread that is potentially run when the timer that was setup with Event Queue's Call Every method. Which is currently configured to run every minute.

As per the bullets you have mentioned, that was the code corresponding to 5.3.3 and 5.4.4 mbed OS builds. I have updated the code based on Vincent's comments for running the same code for 5.5.2, as the code I have mentioned in this thread doesn't compile with the latest OS. Having said that I didn't use rtos_attach_idle_hook in 5.5.2 and I am guessing with the changes to the latest MBED-OS I should be using the method you have suggested. I will give that a shot and see if the application doesn't run into the stack underflow error.

At this point based on our latest testing, 5.3.3 is the only thing that seems to work and is operating at a reasonable lower current draw than any of the other versions of the OS I am building against.

Do you have any potential dates or release dates as to when the tickless support would be in place? We are using MBED OS because it made our application development much easier and focused on the application instead of focusing on how to get low power usage, since these devices are intended to operated under low power consumption. If we can't lower the current draw in the near future we are considering switching to using the NR52 API's directly that come out of box, which I have been told works well for the low current draw scenarios.

Thanks again for taking the time to look into this issue and hoping we can have this resolved ASAP.

Thanks, Yogesh

0xc0170 commented 7 years ago

Do you have any potential dates or release dates as to when the tickless support would be in place? We are using MBED OS because it made our application development much easier and focused on the application instead of focusing on how to get low power usage, since these devices are intended to operated under low power consumption. If we can't lower the current draw in the near future we are considering switching to using the NR52 API's directly that come out of box, which I have been told works well for the low current draw scenarios.

Look at the sleep API proposals that are currently open, and wait for the other PR that should come also soon

Although in your case (nrf52) is the same for sleep and for deepsleep (sharing the implementation), thus these 2 PR do not have that much difference at the moment. Tickless should come soon as this is one of the steps required to make it work.

@pan- any pointers for the problem above?

pan- commented 7 years ago

@yogeshk19

The MBED OS version is 5.5.2. I have not tried the latest version of the officially released MBED OS version yet. The thread that is getting this error, is the thread that is potentially run when the timer that was setup with Event Queue's Call Every method. Which is currently configured to run every minute.

You should try with a version higher than 5.5.2 (at least 5.5.3), #4736 Fixed a critical issue causing a stack underflow error.

yogeshk19 commented 7 years ago

@0xc0170 Thanks Martin for getting back to me. The tickless mode and the other PR's are important to help reduce the current draw on the NRF52 and as of now we haven't had much of a good results using mbed os, hopefully those changes are done as a priority otherwise it wouldn't make sense building applications intended to operate under low power mode using mbed-os.

@pan- Thanks Vincent, I will give that a shot. Is there any distinct advantage of using the later versions of mbed-os if the power consumption/current draw is not improved under the latest builds of mbed-os. Our current application runs a timer every 30 seconds to transmit data over bluetooth, inspite of the application being idle during those 30 seconds, there is a spike in the current draw every few ms. Not sure if the application does go to sleep for the 30 seconds interval, when the application is doing nothing.

Thanks, Yogesh

pan- commented 7 years ago

@yogeshk19 I've made some power consumption measure with mbed OS 5.5.4. The program used was the led example (see) and compiled with the release profile. Without the tickless mode the power consumption range between 22 and 24uA.

If tickless mode is enabled with the same application then the power consumption gain is surprisingly not noticeable.

However enabling the DCDC is highly impactful (NRF_POWER->DCDCEN = 1;). In tickless mode the power consumption drop to 15uA while in the regular mode it drops to 16uA.

So I made a last test, without any workload, In that case there is no difference between the tickless mode and the regular mode. The consumption is around 1.9uA which is in line with the product specification: 1.9 μA at 3 V in ON mode, no RAM retention, wake on RTC.

My equipment might not be sufficiently precise to measure the accurately the gain of the tickless mode but even without it the numbers are quite good. However be sure to enable DCDC if it is available on your design (it is on NRF52_DK).

Our current application runs a timer every 30 seconds to transmit data over bluetooth, inspite of the application being idle during those 30 seconds, there is a spike in the current draw every few ms.

Regulators on the boards produces such spike:

spike

On this picture the average power consumption is 1.9uA.

yogeshk19 commented 7 years ago

@pan- Vincent, we made some power consumption measurements with mbed OS 5.5.4. Like I explained earlier in the thread the Application firmware just does two things.

1) Based on the EventQueue callback every function the BLE peripheral device transmits ever 30 seconds the sensor data, when connected to a central device. In our case we are testing with the IPhone LightBlue App just to speed up our testing. We do this every 30 seconds to keep the BLE connection intact. 2) Every 1 minute for testing purposes, which would be eventually every 15 minutes, we read the sensor data, this include applying power to the sensor via GPIO pin and reading the analog voltage via another GPIO pin.

So after we have transmitted the sensor data, technically since there is no operation occurring, the custom code you gave us to get the BLE peripheral device to sleep, does seem to kick in, but it doesn't sleep for 30 seconds. This I am not sure why it is not in sleep for 30 seconds, since the event queue call every function shouldn't be invoked before the time has elapsed and based on my IPhone app, I see the data being transmitted every 30 seconds.

As a result our average current draw in connected mode is about 71 ua and in advertisement mode it is at 12 ua average. Our current draw in either mode cannot exceed 12 ua.

Please see the picture below in connected mode.

currentdrawconnected

So based on the picture above, why are we seeing, every 30 ms we are seeing a spike in current draw. The DC-DC gen is enabled in our case and that explains the peaks in current as shown in your picture as well and the one I have attached, however doesn't explain the current draw when the BLE device is supposed to be asleep for 30 seconds.

How can I ensure the BLE device sleep for the entire 30 seconds and not wake up until the next transmit. which would potentially reduce our average current draw.

Thanks, Yogesh

yogeshk19 commented 7 years ago

@pan- Vincent if you would like to use our test firmware file to confirm our current measurements, I can share that too.

Thanks, Yogesj

pan- commented 7 years ago

@yogeshk19 Hi Yogesh,

Thanks for providing such a detailed response that help a lot.

Based on the EventQueue callback every function the BLE peripheral device transmits ever 30 seconds the sensor data, when connected to a central device. In our case we are testing with the IPhone LightBlue App just to speed up our testing. We do this every 30 seconds to keep the BLE connection intact.

I think there is big misunderstanding of how BLE works. There is three parameters which define a connection:

It is important to understand that if there is no data to transmit at an higher level, the master shall keep sending data to the slave at every connection interval (might be an empty PDU) and the slave shall skip at most slave latency events.

As an example here it's what happens with a slave latency of 2 when there is no data to transmit: The master continue to send data at every connection interval and the slave can skip two connection events.

20170912_112525

So no, you do not have to transmit data every 30 seconds to keep the connection alive. It is already handled for you by the bluetooth protocol

Could you post the connection parameters used by the connection ? Last time I've tried, default connections parameters used by iOS devices where: connection interval: 30ms, slave latency: 0. That would explain the spike every 30ms (the slave has to listen then transmit).

Once the connection is established you can also request new connection parameters to be used by calling the function Gap::updateConnectionParams. Formulae used to define connection parameters acceptable by Apple devices can be found here.

I would suggest that you increase the connection interval and increase the slave latency (unfairly limited on Apple devices ...).

yogeshk19 commented 7 years ago

@pan- Hi Vincent, Thanks a lot for taking the time to explain the BLE connection params and how best to use it. You are right the default iOS device connection interval both min and max is set to 30 ms and slave latency is set to 0 and the supervision Connection timeout is set to 7.2 seconds. I have updated the BLE connection params based on what Apple formula mentioned on the link you gave and I will run power measurements against these changes.

Having said that just to let you know our end design would be that two NRF52 based BLE devices would be communicating over BLE. i.e one would be the peripheral and the other would be the central. Currently we are measuring the power consumption for the peripheral device which reads sensor data. I am assuming in that scenario the BLE central device which would be an NRF52 device while connecting to the peripheral can set the connection params to a more reasonable connection intervals, slave latency, and Supervision connection timeout?

Also one of the questions I had is around the Tx Power is that set to Max by default when BLE is initialized?

Thanks, Yogesh

pan- commented 7 years ago

@yogeshk19 Hi Yogesh

You are right the default iOS device connection interval both min and max is set to 30 ms and slave latency is set to 0 and the supervision Connection timeout is set to 7.2 seconds.

Just a quick explanation about connection interval min and max. Those two parameters are used to initiate the connection. Basically it inform the Bluetooth stack or controller which will initiate the connection that the connection interval used shall be in a range [connection interval min : connection interval max]. With that information, the stack or controller, choose the connection interval it will use for the connection. The controller will select a connection interval which do not disrupt concurrent operations such other connections already ongoing.

In the connection callback, connection interval min will always be equal to connection interval max. The connection interval is a fixed value because the slave and the master needs a precise rendez vous point to communicate with each other. As indicated earlier, those parameters can be renegotiated during the connection.

Having said that just to let you know our end design would be that two NRF52 based BLE devices would be communicating over BLE. i.e one would be the peripheral and the other would be the central. Currently we are measuring the power consumption for the peripheral device which reads sensor data. I am assuming in that scenario the BLE central device which would be an NRF52 device while connecting to the peripheral can set the connection params to a more reasonable connection intervals, slave latency, and Supervision connection timeout?

Yes, in that situation you are in control and you can specify the correct connection parameters in the central, before the connection is established with the peripheral. However this is not always a good idea to start with slow connection parameters because before doing anything useful, the gatt client shall discover the layout of the gatt server. There is no reasons to slowdown this process. So my advice would be discover the other device quickly then renegotiate the connection parameters once the discovery is terminated.

Another thing I'd like to point out, connection parameters are more or less specific to the peripheral application, that'd make sense if the negotiation of new connection parameters is initiated by that device so it can preserve power with any central connecting to it.

One last thing, is the NRF52 central on battery too in your scenario ?

Also one of the questions I had is around the Tx Power is that set to Max by default when BLE is initialized?

By default the TxPower is set to 0 IIRC. However that's something you can tune based on the RSSI receive in advertising packets if thedevices are not moving around

nvlsianpu commented 6 years ago

@yogeshk19 I this issue still valid.

yogeshk19 commented 6 years ago

@nvlsianpu - Based on the suggestions made by @pan- Vincent, we were able to get the desired power consumption level. At this point you can close this thread. However is there a build that is coming out which implements the changes for the sleep without us adding any custom code?

Thanks, Yogesh

pan- commented 6 years ago

@yogeshk19 With mbed-os-5.6, the tickless mode is enabled by default on NRF52832 based targets; NRF52840 targets may be enabled for 5.7.

screamerbg commented 6 years ago

cc @nvlsianpu

nvlsianpu commented 6 years ago

@yogeshk19 Can you close this issue is it is not valid anymore.