Closed tbondar closed 7 years ago
@tbondar Thanks for the detailed bug report.
After some time the LED stops blinking. The BLE interface still works, i.e. I can connect to the device and read characteristics correctly. When there is some BLE activity the LED blinks once or twice. It seems that the EventQueue stops calling the blinkCallback() function.
Did you observe the same behavior if the RTOS is enabled ?
I couldn't reproduce the problem with RTOS enabled (i.e. the default .mbedignore
file). I keep my test running just in case, but it seems this problem is related to when low power mode is enabled one way or another.
@nvlsianpu could you look into this ?
@pan- @nvlsianpu Just wondering if there's a chance to get a fix for this soon or I'd better look for some workaround. Thanks.
I started with looking into. I encountered that this if was not fullfiled. After code had gone through "that if" it weren't generated periodical event requested by eventQueue.call_every method.
@tbondar could you confirm that it behaves similar on your desk? You can for example use printf for alternate condition of "this if".
@nvlsianpu (Previous comment updated after further tests)
You are probably right. I made the following changes to equeue_mbed.cpp
and let the application running. When it breaks, LED3 stops blinking and LED4 does not start blinking. LED 4 toggles occasionally when certain BLE events happen or the button is pressed. Apparently, variable ms
is negative or zero.
--- a/events/equeue/equeue_mbed.cpp
+++ b/events/equeue/equeue_mbed.cpp
@@ -128,12 +128,17 @@ static void equeue_sema_timeout(equeue_sema_t *s) {
*s = -1;
}
+DigitalOut led3(LED3);
+DigitalOut led4(LED4);
+
bool equeue_sema_wait(equeue_sema_t *s, int ms) {
int signal = 0;
Timeout timeout;
if (ms > 0) {
timeout.attach_us(callback(equeue_sema_timeout, s), ms*1000);
- }
+ led3 = !led3;
+ led4 = 1;
+ } else { led4 = !led4; }
core_util_critical_section_enter();
while (!*s) {
We encountered the same. Great! It looks like some code had been executing to long and therefore a event time to setup had been overdue and then expected new event didn't been requested anymore.
It's interesting that it happens after exactly the same time (~ 8200 seconds) while the code is advertising only. I don't need to stress it in any way to reproduce it. Another comment is that a very similar thing happens with RTOS enabled, please see my comments at #3509 and here.
@pan- Why the creating evet-queue requests tick timer to start? here
Why the creating evet-queue requests tick timer to start? here
@nvlsianpu Do you think it should be started only when the tick timer is actually in use; when their is at least one event in the event queue to execute in N ticks ?
I think it is a commodity solution, the equeue_tick
function is used by all event queues the system can create and is not tied to any instance so it is easier to think about it if the tick is always running.
@geky Am I correct ?
Do you think it should be started only when the tick timer is actually in use; when their is at least one event in the event queue to execute in N ticks ?
I think it could be kept as is for now - this is very rare tick and RTC consume a few uA. But it could be a case for ultra low power devices (like a flowmeter).
If the time is close to the period which was requested by creating of the evnt-queue it is possible the following code execution:
It looks like sometimes after that sequence, the expected periodic event (e.g. of 50 ms) is never requested again.
I discovered that RTC CC event were always being requested periodically: 1st by the ticker form RTC handler and very soon 2nd by the event dispatcher - until described sequence.
CC set twice CC set twice CC set once
1 2 1 2 1
__|__|______________________________________|__|_______......______________|_.___________
I will back to investigation on monday.
Ah, yeah, the event queue assumes the equeue_tick
is always running.
The queue keeps track of the last tick a dispatch occured on. This is used to decide if a cancelled event is still in the queue (here). We also don't assume when the tick starts, so choosing 0 may happen in the near future.
We could add something that decides this tick lazily as @pan- suggests, but currently the queue simply loads from the timer during initialization.
Could it be that equeue_sema_wait
is getting called with a wait of zero? This can happen if it takes enough time between dispatching events and going back to wait (here). It looks like the non-rtos equeue_sema_wait
may not handle zero correctly (here).
bool equeue_sema_wait(equeue_sema_t *s, int ms) {
int signal = 0;
Timeout timeout;
- if (ms > 0) {
+ if (ms >= 0) {
timeout.attach_us(callback(equeue_sema_timeout, s), ms*1000);
}
core_util_critical_section_enter();
while (!*s) {
sleep();
core_util_critical_section_exit();
core_util_critical_section_enter();
}
I discovered that RTC CC event were always being requested periodically: 1st by the ticker form RTC handler and very soon 2nd by the event dispatcher - until described sequence.
One thing I have noticed in the debugger is that every time the LED stop blinking, events are still scheduled in the queue, the low power timer is still running but the value of the CC register is set to 0. I doubt this is the issue but the documentation is not very clear about this situation:
COMPARE occurs when a CC register is N and the COUNTER value transitions from N-1 to N.
I discovered that RTC CC event were always being requested periodically: 1st by the ticker form RTC handler and very soon 2nd by the event dispatcher - until described sequence.
There is one things to notice in the RTC documentation:
• If the COUNTER is N, writing N or N+1 to a CC register may not trigger a COMPARE event
It seems that the implementation is protected against this case: (here). But in reality, the code makes two wrong assumptions:
It would be interesting to protect this code by a critical section and use uint32_t closest_safe_compare = common_rtc_32bit_ticks_get() + 3;
as the minimal value the CC register can accept (stopping the counter is too expensive).
@tbondar Could you try with this patch:
--- a/targets/TARGET_NORDIC/TARGET_NRF5/us_ticker.c
+++ b/targets/TARGET_NORDIC/TARGET_NRF5/us_ticker.c
@@ -190,13 +190,16 @@ void common_rtc_set_interrupt(uint32_t us_timestamp, uint32_t cc_channel,
// difference between the compare value to be set and the current counter
// value is 2 ticks. This guarantees that the compare trigger is properly
// setup before the compare condition occurs.
- uint32_t closest_safe_compare = common_rtc_32bit_ticks_get() + 2;
+
+ __disable_irq();
+ uint32_t closest_safe_compare = common_rtc_32bit_ticks_get() + 3;
if ((int)(compare_value - closest_safe_compare) <= 0) {
compare_value = closest_safe_compare;
}
nrf_rtc_cc_set(COMMON_RTC_INSTANCE, cc_channel, RTC_WRAP(compare_value));
nrf_rtc_event_enable(COMMON_RTC_INSTANCE, int_mask);
+ __enable_irq();
}
//------------------------------------------------------------------------------
@nvlsianpu Can it be the only source of error ?
@pan- It doesn't look as bad CC set. I have a picture with signals:
My diagnose is; Event queue service didn't recognize event as its-own if event timeout was set in ticker handler.
@pan- I'm afraid the patch above didn't fix the problem.
It looks like change proposed by @geky (https://github.com/ARMmbed/mbed-os/issues/3857#issuecomment-287166802) patches the bug. @tbondar could you confirm or contradict. My board didn't hang since more than 12 hours.
@nvlsianpu OK, I'm starting a test right now. @pan- @geky @nvlsianpu Guys, do you all agree that this is probably a reasonable solution to the problem? We plan to use this code in a device that's going out to our customer and we don't have much time left to do long term tests. Thanks for all your work so far!
I created a pr for review (#3991), hopefully this is the fix
This patch makes the non-rtos and rtos versions of equeue_sema behave consistently. The rtos version has the main support of mbed OS, so there should be fewer issues outside of the equeue_sema layer.
@nvlsianpu Unfortunately my board still stopped blinking after 8200 seconds. What is the blinking period that you use? Please note, that I changed the original 500 ms to 50 ms, as stated in the original bug report.
One more interesting addition is that sometimes the blinking restarts spontaneously after some time. I will try to collect more information about this.
I contradict as well - https://github.com/ARMmbed/mbed-os/pull/3991 didn't fix this issue (but it still fixes other issue :D). I reproduced issue witch changes as well.
I prepared patch https://github.com/ARMmbed/mbed-os/pull/4016. I'm waiting now for verification using board on my desk. cc @tbondar , @pan-
I confirm that PR 4016 mentioned above fixes this issue.
@nvlsianpu I can confirm too: it works. Many thanks for getting this fixed!
Has this fix made into the main branch, if not what branch of the mbed-os is this fix available?
Thanks, Yogesh
@nvlsianpu Thanks for taking the time to respond. Do I have to pull the entire the mbed-os or can I just get the changed files that effect the above issue. I am currently using the 5.2.3 version of mbed-os.
Also Is there a timeline as to when this PR would be completed and added to specific mbed-os branch? We are trying to figure out our beta testing dates and just was hoping to get an idea as to when this fix would be made into an official mbed-os 5 branch
Thanks, Yogesh
Do I have to pull the entire the mbed-os or can I just get the changed files that effect the above issue.
It is one file fix - you can just copy us_ticker.c
I can't to respond to second question. It's fix so it should be merged to the master ASAP.
@nvlsianpu - Thanks for getting back to me. I will just use that file for now. How would I know if this has been committed to one of the official branches. Just check the PR status?
Thanks, Yogesh
@yogeshk19 you can check the PR status. I've just accepted the PR content it should be a matter of few days now.
I must add that there is also a bug affecting every time related primitive (Timer, Ticker, Timeout, ...) if the timeout requested exceeds 2147 seconds. I've made a PR for it: (#4094), it is still in progress but I have good hopes that it will be integrated soon.
Shall we close this issue?
@nvlsianpu I suggest to keep it open till v5.4.4 is released. If it's not reproducible with that then I think we can close it. (For our production code we moved from mbed to Nordic SDK, so this problem doesn't have direct effect on our product any more.)
So this change hasn't been merged into 5.4.3 that was released a week ago? If not when is 5.4.4 going to be out, so these change is merged into an official release?
Thanks, Yogesh
Patch release are published every 2 weeks. So It will happened in a week.
The fix should be available in a release now right?
This should be closed. @tbondar @0xc0170
Hey all, I've run into an problem that is possibility related to this issue. I'm using an mbed implementation from RedBear on the nRF52 module. Code here Using their BLE peripheral example combined with a ticker example, I'm seeing the ticker fail almost immediately after startup. I've updated the us_ticker.c file to reflect the changes in the 5.4.4 release, with no luck. Without the softdevice running, the ticker works perfectly. Any guidance or suggestions are welcome.
Keep up the great work!
@lpercifield Can you raise a new issue against mbed-os detailing your set up , tool versions etc , mbed-os versions etc and we can take another look. Thanks
Description
I'm trying to get the BLE_Button example working with low power enabled. Based on the answer to this question I compiled the example code with RTOS disabled (by adding
.mbedignore
as suggested). Power consumption went down as expected but the LED stopped blinking after a couple of hours of running. This makes the BLE unusable on nRF52 for any practical purpose because either it has too high power consumption or stops working after a few hours.Bug
Target NRF52_DK
Toolchain: GCC_ARM
Toolchain version: gcc-arm-none-eabi-4_9-2015q3
mbed-cli version: 0.9.10
mbed-os sha: 5faf4b2 Merge pull request #3837 from ARMmbed/release-candidate
Expected behavior BLE_Button example should keep working with low power enabled.
Actual behavior After some time the LED stops blinking. The BLE interface still works, i.e. I can connect to the device and read characteristics correctly. When there is some BLE activity the LED blinks once or twice. It seems that the
EventQueue
stops calling theblinkCallback()
function.This happens after a couple of hours of running. It seems that the exact time required depends on the on the event dispatch frequency:
If
n
is 50, LED stops blinking after 8200 seconds, ifn
is 20, it stops after about 6670 seconds.Steps to reproduce Download the official BLE_Button example, disable RTOS by adding the following
.mbedignore
to the project and compile the code.Additional Comments I have been trying to make BLE examples work with low power for a few days now using various methods. I tried using
waitForEvent()
and Tickers with a very similar result. It seems that some low level code interfacing the nRF52 timer might be buggy in some way.