CookiePLMonster commented 9 months ago

Describe the bug.

Intro

The following example app I created to familiarize myself with the Flipper Zero ends up breaking the input processing. 90% of the time it is not possible to terminate this app gracefully with the Back button, and Flipper remains unresponsive until rebooted (with Left+Back): https://gist.github.com/CookiePLMonster/a95e7974b287cbc11714e234a477a871

While this app doesn't follow the conventional structure of a Flipper app, it is (to the best of my knowledge) valid and attempts to keep a steady ~60FPS refresh rate on the rendered checkerboard content. This is done by ensuring that the View Dispatcher's tick function executes 60 times per second, and updates the View in order to trigger a Draw callback on it consistently.

The issue is that with this setup, the app appears to consistently starve input processing, as proven by the exit_app log message not appearing at all under the broken scenario. As outlined in the code, any attempts to add extra logging in the app shift timing just enough for the bug not to manifest.

Breakdown

Together with @kbembedded, we've troubleshooted this issue on the Flipper Zero Discord to the best of our abilities and came to the following conclusions:

Under the broken scenario, the firmware gets stuck waiting for the input timer to stop. The while loop in this place never exits, despite being entirely correct usage-wise: https://github.com/flipperdevices/flipperzero-firmware/blob/027ea9ea36da137144548295c016d99255af53c3/applications/services/input/input.c#L115-L117
Raising the priority of times via a furi_timer_set_thread_priority(FuriTimerThreadPriorityElevated); call mitigates this issue.
Yielding from the app's tick callback also mitigates the issue.
Replacing the View Dispatcher tick callback with a dedicated timer also mitigates the issue, unless the timer interval is set too low. In my tests, setting it to 1 tick resulted in an identical deadlock issue.

Based on the following, we have a reasonable suspicion that the issue relates to the thread starvation. The timer thread runs at a priority 2, unless boosted by the API call, which has been proven to help. The main thread that executes the View Dispatcher's tick callback, as well as the GUI thread, both run at a priority 16. Therefore, it's quite possible that those threads never give the timer thread a chance to execute under those broken conditions, preventing the stop request from being acknowledged and thus leading to a deadlock.

What makes sense?

Those observations support the theory of this issue being caused by improper thread priorities:

Increasing the timer priority "helps".
Adding furi_delay_tick(1) in the tick callback "helps".

What does not make sense?

These observations go against this theory:

Why exactly the timer with interval 1 reintroduces the issue? If timers are not executed "fairly" and in order, perhaps this one timer ends up taking all the CPU time for itself, not giving any other timers the chance to process?
Adding furi_delay_tick(0) (aka yield) in the tick callback also "helps". This is counter-intuitive, as yielding should only relinquish time to threads of equal or higher priority, yet the low-priority timer thread is given the chance to execute when my normal priority thread yields.

Suggested actions

These issues seem to be two separate issues:

Imbalance of the default thread priorities vs the timer thread priority, making it easy to starve the main thread.
An unchecked ability for the applications to request a timer with a very low interval, starving all the other timers for their execution time.

Based on the following, these changes might make the most sense:

Rebalance the default thread priorities. Inputs should have priority over the app actions to make it possible to leave the application regardless of its CPU load. This might entail either moving the input timer away to its own, high-priority input thread, or making the default timer thread priority on par or higher than the default main thread priority. We were not 100% sure what is the best approach in this scenario, which is why this is an issue and not a PR.
Introduce a lower bound on the timer interval. furi_timer_start and furi_timer_restart already include an assertion on the upper bound of the timer interval, but a lower bound should also be added, like:
```
if (ticks < 10) ticks = 10;
```
A similar approach is taken e.g. by Windows:

If uElapse is less than USER_TIMER_MINIMUM (0x0000000A), the timeout is set to USER_TIMER_MINIMUM. If uElapse is greater than USER_TIMER_MAXIMUM (0x7FFFFFFF), the timeout is set to USER_TIMER_MAXIMUM.

Reproduction

Compile and deploy the attached sample program, and boot it.
Attempt to terminate it by pressing the Back button. The app should remain open, and Flipper needs to be rebooted.
If the app terminated gracefully, try again. It usually doesn't take more than 3 times to lock up.

Target

No response

Logs

No response

Anything else?

No response

skotopes commented 9 months ago

Those are fundamental problems with furi timers(actually FreeRTOS one).

What you seeing is a consequences of having realtime operation system reaching point of breaking its hard realtime promises(which happens under 100% CPU utilization). Generally speaking anything that utilizes 100% of cpu for a long time will cause timer thread to not be scheduled.

There is a reason why timer thread running at lowest priority: we don't want to overload and deadlock system with events if we don't have CPU resources left to process them. I've been thinking a lot on how we can solve this in non-invasive way and arrived to conclusion that we can't make it with current FreeRTOS design. It is however planned to be fixed in next furi os generation.

So at this point we treat it as a feature and you should avoid using all CPU resources.

CookiePLMonster commented 9 months ago

This all makes sense, I suppose the immediate issue at hand (input dropping) is unsolvable without delegating a new high-priority thread to process those instead of the timer callback - but that takes resources to keep another thread alive.

Any changes to this also feel somewhat risky, who knows how many existing apps unknowingly rely on the timers being lower priority than the normal threads. I suppose for the sake of my own app, either increasing the timer priority via a furi call or, even better, strategically yielding execution in points I know I can spare some CPU time, is optimal.

skotopes commented 9 months ago

System itself relies on timer thread having lowest possible priority. Can you show me your code.?

CookiePLMonster commented 9 months ago

System itself relies on timer thread having lowest possible priority. Can you show me your code.?

From my post:

The following example app I created to familiarize myself with the Flipper Zero ends up breaking the input processing. 90% of the time it is not possible to terminate this app gracefully with the Back button, and Flipper remains unresponsive until rebooted (with Left+Back): https://gist.github.com/CookiePLMonster/a95e7974b287cbc11714e234a477a871

kbembedded commented 9 months ago

System itself relies on timer thread having lowest possible priority.

I do have to say this is an interesting design choice. While FreeRTOS soft timers arn't the best, having them at a low priority can easily starve the whole system (as seen here).

I will admit I don't know the full scope of the OS design choices and intentions. But in my experiences, timers, especially when used as part of the UI, are one of the most important threads. When the Flipper has well over a dozen threads spawned at the same priority, keeping only timers and idle below the rest of the system is a quick way to starve the whole system.

If timers truly are meant to be very low priority in the Flipper (much of the use looks to be for animation handling), then I would recommend considering moving the input system to something other than timers to handle debounce and press length as that is the main problem here. In this example, if input processing still functioned and the tick_event timers were stalled, it would not manifest as an unresponsive system and rather would have starvation elsewhere (likely show a lower draw rate) but would still be responsive to inputs.

skotopes commented 9 months ago

@kbembedded looks like my first answer was a little bit too abstract

TL;DR: General recommendation: give OS some time(2+ ticks) if you do long running CPU intensive tasks. Everything else must be event driven.

First of all it's not we who originally set this priority. This priority is literally default by FreeRTOS and highly not recommended to be changed. Especially without understanding of consequences.

Now, there are 2 sets of problems:

Event spawning and consumption problem. In order to keep system stable events must be consumed faster than spawned. In best case scenario slow consumption will cause wait on MQ send operation in event emitter thread. In the worst case there will be deadlock.
CPU time starvation in RTOS. FreeRTOS (with timer thread and dynamic allocations) is a hard realtime OS. It means that that starving on CPU and using some specific APIs will bring system to bifurcation point where it's not going to behave as you expect.

Current design and FreeRTOS kernel configuration are trying to keep balance between all those factors. We do have ideas how to make it even better, but that is requires changes not only to our code but also to FreeRTOS itself and they planned in next major update of furi.

@CookiePLMonster to be honest, tick timer is a totally cursed thing. Let me remind you that whole system is based around events. You don't need to refresh screen by yourself(also physical refresh rate is less than 20fps). And if you do want to do it with timer, then ticks from timer thread is the worst available option.

CookiePLMonster commented 9 months ago

And if you do want to do it with timer, then ticks from timer thread is the worst available option.

What is the better option then? Most games I've seen use exactly that to run their game logic loops at a predefined rate.

skotopes commented 9 months ago

@CookiePLMonster what kind of game are you developing? Fixed frame rate in most cases that I saw is not needed, and pure event driven approach is preferred.

CookiePLMonster commented 9 months ago

In practice, fixed tickrate is needed in any cases where you have animations, passing time or any kind of AI performing actions without user's inputs. Driving the game logic through event works only in simple cases.

skotopes commented 9 months ago

@CookiePLMonster I'd say that is how old(designed 25 years ago) game engines were working. However since we are running on embedded platform it actually will be easier to implement.

There is a direct rendering API that may be more useful for your needs.

CookiePLMonster commented 9 months ago

I'd say that is how old(designed 25 years ago) game engines were working.

Yes and no, even in Unreal Engine 5 you still have a notion of a tick so it largely depends on the way the game is designed from the ground up.

There is a direct rendering API that may be more useful for your needs.

Oh, is there? I've been relying largely on rasterizing my arbitrary buffer and then copying it via canvas_draw_xbm from the draw callback until now.

skotopes commented 9 months ago

@CookiePLMonster applications/debug/direct_draw

CookiePLMonster commented 9 months ago

@CookiePLMonster applications/debug/direct_draw

That's great! Almost perfect for this use case, the only thing missing for perfection would be the way to access the buffer directly to draw to it directly (via a bit band alias), or to memcpy data to it. From what I've seen, canvas_draw_xbm still does per-bit drawing through u8g2.

skotopes commented 9 months ago

We are going to get rid of u8g2 eventually, so avoid using u8g2 directly.

CookiePLMonster commented 9 months ago

We are going to get rid of u8g2 eventually, so avoid using u8g2 directly.

Yeah, based on my test it wasn't an option either, as

u8g2 doesn't seem to have a memcpy-like function
A hacky attempt to get the buffer pointer from u8g2 directly and write to it failed - not exactly sure why, perhaps this memory isn't contiguous or I was missing some call to mark the buffer dirty.

skotopes commented 9 months ago

What exactly do you need? Like original task that you are planning to solve? We can add additional APIs to canvas.

CookiePLMonster commented 9 months ago

What exactly do you need? Like original task that you are planning to solve?

The canvas angle has strayed far from the original purpose of this specific issue (as it's not about timing), but that's fine.

The current approach I used (and I know it's been used by more projects in the past, so it's not just me) is to have two 16x64 "back buffers" explicitly allocated and manually rasterized by the application, then copied to the canvas on demand: https://github.com/CookiePLMonster/flipper-mode7-demo/blob/main/mode7_demo.cpp#L113-L116

I suspect this to be a bottleneck - having the buffer rasterized on the main thread and then copied this way brought the FPS down from 60 to 45 with the draw_xbm alone. There seem to be two ways to help with this pattern:

Expose the buffer directly. From what I understand, canvas_get_buffer() does this but I don't know if I could then just memcpy my "back buffer" into this pointer without having to go through u8g2. I've seen that the draw_xbm still draws bit by bit without even using the bit band alias, so it's likely much more wasteful than it could have been. Writes to the bit band alias are slower than normal writes (for obvious reasons), but the gains from removing all the explicit bit manipulations would definitely make up for that.
Allow to "hand over" the screen buffer to the canvas. This would be the most efficient from the application side, but may not be viable due to Flipper's drawing model that is "draw whenever you receive the draw callback, no questions asked". However, maybe this approach would have worked fine with Direct Draw.

EDIT: Regarding 1. - I will test it by exposing canvas_get_buffer() to public headers locally and recompile the firmware with that, as I have only tried accessing this pointer by reading the canvas structure manually in my app.

kbembedded commented 9 months ago

First of all it's not we who originally set this priority. This priority is literally default by FreeRTOS and highly not recommended to be changed. Especially without understanding of consequences.

While 2 is the default priority, I've never seen recommendations to not change it, can you link me resources on that?

FreeRTOS timers have their callbacks running in the timer thread context. Which, when every other thread is running at a higher priority will cause priority inversion issues. This is what I believe is being exposed in @CookiePLMonster's code. Also, while I've never independently tested it over the years, FreeRTOS does claim that no CPU time is consumed when there are no timers expired. There is no additional overhead of having the timer thread at a high priority when there are no timer commands nor expired timers pending servicing.

The reason I say the issue is priority inversion is that by running @CookiePLMonster's sample with the timer priority either matching the rest of the threads, or using furi_timer_set_thread_priority() to set the timer thread at the highest priority, suddenly input processing is handled because the timer associated with the input handling thread can run. Meanwhile the in-app FPS counter dropping from an average of 62 to 59 while mashing on the face buttons and returns to an average of 62 when input events stop.

In my opinion, this is what should be happening in the event of CPU starvation on the Flipper; everything runs with some delay*. The code isn't causing a deadlock, it's preventing the input thread from doing anything because the input thread is now locked to the same priority as the timer.

(*) I know this isn't always ideal, especially in some of the time sensitive applications of the Flipper. See further down.

Now, there are 2 sets of problems:

Event spawning and consumption problem. In order to keep system stable events must be consumed faster than spawned. In best case scenario slow consumption will cause wait on MQ send operation in event emitter thread. In the worst case there will be deadlock.

CPU time starvation in RTOS. FreeRTOS (with timer thread and dynamic allocations) is a hard realtime OS. It means that that starving on CPU and using some specific APIs will bring system to bifurcation point where it's not going to behave as you expect.

In my testing, I've found that timer events are not spawning, nor being consumed; the timer thread is just not getting CPU time. As very few timer commands are issued in normal operation, there is never a wait on its mqueue in this situation. It would actually be ideal in this case if another thread was being blocked waiting on sending a timer command to its mqueue, as that would potentially let the timer thread actually run and process timers.

This could be argued as a starvation issue, but I'd argue the resources using timers and how timers are treated is the problem, again leading to priority inversion, moreso than spawn/consumption or starvation.

In just grepping through the code for furi_timer_alloc(), of the 40 hits, all but about 5 of them are for amazingly low priority use. Animations, popup timeouts, PIN timeouts, connection timeouts, mouse click timeouts, etc. One of the 5 that I mentioned is the input system however.

In a system like the Flipper, human input and output is extremely important most of the time. Exceptions are for when the Flipper is doing one of its many, but short, timing critical tasks. Otherwise, when navigating the menus, interacting with applications, etc., the Input and GUI systems are the top priority. And the input system is relying on the timer which was relegated to being nearly the lowest priority thread.

I strongly encourage one of a few outcomes: 1) Timers are elevated in priority by default to at least match the priority of every other thread. Even when used with low-priority operations like animations or timeouts, when timers are unused they consume next to no CPU time. If this truly is a problem, then threads that need that one extra rarely used time slice can slam the timer thread to a much lower priority. 2) The input system stops using timers and instead implements its own time tracking for short, long, repeat events. This would be a non-trivial change, unfortunately. 3) When the input system is actively processing and waiting on timers, have the input system elevate the timer priority. There are some potential ramifications with this, however. For example, if there are timing critical sections of code running, input processing would throw the timer thread around and potentially disrupt that critical operation. I think 1) is a more valid solution for this reason.

@CookiePLMonster to be honest, tick timer is a totally cursed thing. Let me remind you that whole system is based around events. You don't need to refresh screen by yourself(also physical refresh rate is less than 20fps). And if you do want to do it with timer, then ticks from timer thread is the worst available option.

I'll finish this by saying; yes, most of this is moot simply because of the physical limitations of the LCD and there may otherwise be very few CPU intense operations that would expose this. However, being able to accidentally gum up the whole system because inputs get locked out is not ideal in a system that has a lot of focus on its UI.

I'm happy to continue to discuss, but if you are not interested in any further arguments then I won't continue to press the issue. If there is any specific data you are interested in, I'd be happy to gather it.

skotopes commented 9 months ago

I think it will be easier if you do the experiment: set timer thread priority to the same value as default thread priority and do full set of tests(fw + catalog apps). Then we can continue.

kbembedded commented 9 months ago

I think we are both well aware that: 1) There is no way I can reasonably test every FW + catalog app given the sheer number of apps and what they support (NFC/LFRFID alone have such a broad scope). 2) Testing functionality doesn't really prove the absence of any issues.

I've instrumented runtime stats in FreeRTOS for the Flipper. Assuming your concern with changing the timer thread priority is run time CPU consumption (to the point of negatively affecting apps), I think a good starting test would be comparing relative thread runtime under as many apps+scenarios as I can reasonably test with the current timer priority and an elevated one.

If you are in agreement with that, then I will start gathering data on that. If not, then seeing as I cannot reasonably test every single application under every condition, we can just leave the timer as it is.

P.S. After spending last and this evening looking, I was unable to find any literature recommending against increasing the timer thread priority in FreeRTOS, or any RTOS. That is not something I've heard before and I would still love it if you linked me to any resources on that as I want to know more about the topic.

P.P.S. If you think enabling runtime stats as a part of the ps CLI output would be useful as a build-time option, let me know and I can put together a PR for that. The implementation that I have now is quite hacky as the FURI API doesn't really lend itself to the steps needed to do the runtime accounting, but with some time I can make a cleaner implementation.

skotopes commented 9 months ago

@kbembedded Runtime stats are not going to give anything meaningful. Also this work is already complete in BLE refactoring PR that is coming in next release.

kbembedded commented 9 months ago

Sounds good. Cheers.

CookiePLMonster commented 8 months ago

This issue is probably good to go by now:

The bug with timers @kbembedded has noticed is now fixed.
The core problem with the issue I filled is the app not giving the OS any time to process its events. While that was semi-clear from the beginning, now I see that my confusion regarding this issue was partly due to the FPS counter lying to me. Even though my code may have ticked every 16ms, the draw calls were running at only around 20FPS, meaning 2/3 of the work done by my tick was discarded. This has falsely led me to believe there should be more CPU time to spare than there was in reality.

TL;DR - it's fine for apps like this to be intensive, but they need to leave a few ticks to the OS once in a while.

flipperdevices / flipperzero-firmware

Furi: Apps can easily starve the timer thread, breaking input processing #3380

Describe the bug.

Intro

Breakdown

What makes sense?

What does not make sense?

Suggested actions

Reproduction

Target

Logs

Anything else?