An idle application with no visible windows still uses noticeable CPU time

nullst commented 3 years ago

Describe the bug:

I am writing a desktop GUI application that would be usually running in the background, with minimized window that is opened from time to time. I've discovered that any fyne application, including fyne_demo or even a simple window with a single label in it, puts some consistent CPU load even if the application window is not visible and no work is being done. It's not a large amount of CPU usage, but over time it adds up to a battery drain, making even the most basic fyne application the 8th largest energy consumer on my laptop if left running for 20 hours (assuming that a large portion of those 20 hours are spent in the web browser). That's more demanding than, e.g., Telegram chat client even though that program is actually being used actively for some portion of those 20 hours, not just stalling in the background.

On my computer, the background load (of any fyne application that does no work in the background) is 3-4% CPU usage, as reported by "Activity Monitor" of my OS. This can be confirmed with standard golang CPU profiling: it reports 1.27s of CPU time during 30.03s of running in the background:

As one can see from this not very informative graphic, a small amount of work (~0.17s out of 1.27s CPU usage) is being done in glfw.PollEvents as invoked in runGL, but the majority of time is spent by golang's scheduling mechanism (runtime.schedule and its friends). I don't know what exactly is this scheduling mechanism working on. It is my impression that channel communication, for example, performingselect over several channels, may be part of this.

Ticker tuning helps, but not completely

Fyne's runGL from internal/driver/glfw/loop.go is an event loop which, in particular, polls GLFW events 60 times a second, triggered by the messages on a channel created by time.Ticker. This frequency is constant, regardless of whether the application window is visible or not. Even though profiling does not indicate that runGL is a significant CPU consumer, it is possible that some part of runtime.schedule CPU usage is the overhead for the select statement reading from several channels in runGL loop, since that select is performed 60 times per second. This is in fact the case.

As an experiment, I reduced the ticker frequency (by editing loop.go) to a rather extreme "1 event per second". This reduced background CPU load by a factor of two. Only 0.63s of CPU time were used during 30s of running in the background:

Thus I would suggest some dynamic tuning of the ticker frequency as an enhancement of runGL: if the application window is not visible, maybe run the ticker only 10 times per second, or something like that. This would significantly improve the energy usage for fyne applications running in the background.

However, a consistent 1-2% CPU load is still not the best experience for a background application that does absolutely nothing. This is worse than the majority of background applications my computer is currently running. I'm not sure what can be done to reduce the scheduler load since I don't know what causes it. I haven't checked whether the channel funcQueue in runGL wakes often -- if it is, maybe the same select loop in runGL is responsible for the entire background load.

Another option that I didn't try is to update Go: maybe new versions of golang have a more efficient runtime scheduler. But probably something can be done on the fyne level to reduce the load anyway.

To Reproduce:

Steps to reproduce the behaviour:

Run any fyne application, e.g., fyne_demo.
Keep it in the background.
Check your CPU load with whatever process monitor you have.

Device (please complete the following information):

OS: MacOS
Version: 10.12 Sierra
Go version: go1.15.3 darwin/amd64
Fyne version: 6e90820ea9ca4df836db5a99384c51073415571e

dweymouth commented 1 year ago

Just tried a timeout of 1/60 second - it keeps animations smooth but actually seems to use even a bit more background CPU than the status quo. I think the issue is having the glfw loop waking up 60 times a second on a continuous basis, whether it's through a polling loop or a short timeout, is just a lot of busy-work on that thread.

Bluebugs commented 1 year ago

So basically the problem is that we need to only run our render loop when there is something to be done any solution that doesn't solve that problem will lead to the same issue repeating.

Trying to think of a proper solution, I just realized that the ticker for the animation runner is not synchronized with the render loop which make my first idea of using it not work and additional work to make that happen. From the code, I think we could actually somehow make the driver/glfw/loop.go:runGL() eventTick.Stop() when in driver/common/canvas.go:CheckDirtyAndClear() switch the state of c.dirty and have eventTick.Start() when we get state switch on SetDirty(). The main challenge is that the atomic is per canvas/window, while the eventTick is for all windows. I am guessing we would need an atomic counter in loop.go to make those decision. This should let the eventTick run only when there is something to render.

andydotxyz commented 1 year ago

Two things to consider here - 1) we already don't paint if there is nothing to paint, so it should be not doing anything (bear in mind a blinking cursor is constantly asking to repaint). 2) getting user input events must continue to either poll or event request even if nothing is drawn as it is those events that trigger the change that makes the draw updates.

tenox7 commented 1 year ago

Please consider use case of a desktop clock, CPU/network/etc monitor, music player, etc. Runs in background (but visible) and needs to refresh on it's own once a second or so.

dweymouth commented 1 year ago

Did some more testing around - interestingly, switching from glfw.PollEvents to glfw.WaitEventsTimeout does not block button push animations from occurring on Ubuntu (standard OS distribution, AMD Ryzen CPU and Radeon graphics), or on Windows 11 (tested on Virtualbox VM on said Ubuntu host). Seemingly, it is only on Mac OS that the background animations are unable to occur if the GLFW loop is asleep. Switching from PollEvents to WaitEventsTimeout with a long timeout on Windows dropped background CPU use to basically 0.0%. It does result in the app window not drawing until the GLFW thread wakes up, but once the window is present, background rendering seems entirely normal.

All this suggests that on Windows and Linux, switching from PollEvents to WaitEvents after the window is visible would reduce background CPU with no impact on the app after that (obv. more testing would be necessary!), but for Mac OS, switching back to PollEvents at 60fps would be necessary whenever rendering needed to happen/was in progress, after which the loop could go back to using WaitEvents. Whenever rendering was needed (e.g. some widget's Refresh was called), posting an empty GLFW event would be needed to allow the thread to wake up and switch from waiting to polling until the render refresh was done.

Edit: Actually the button animations being blocked by the GLFW loop does not seem to occur on my old x86 Macbook, only on my M1 Macbook.

andydotxyz commented 1 year ago

Seemingly, it is only on Mac OS that the background animations are unable to occur if the GLFW loop is asleep.

This is correct, Darwin has a different threading model for screen refresh.

Edit: Actually the button animations being blocked by the GLFW loop does not seem to occur on my old x86 Macbook, only on my M1 Macbook.

To workaround a M1 bug with GLFW we are not using multiple threads on that CPU - a single event and draw is used hence the difference.

To see the blockage happen on a Intel Mac try resizing the window. That will freeze the window event queue so draw stops unless you are ticking OpenGL on a timer. This is a documented issue with macOS and their tools have workarounds internally too.

Jacalz commented 1 year ago

FYI: The fsnotify dependency was updated for the upcoming v2.4.0 release. You should see less CPU usage.

Jacalz commented 1 year ago

I did a quick test where I just replaced glfw.PollEvents() with glfw.WaitEvents() (at loop_desktop.go#39) with no other changes and it almost works perfectly with reduced CPU usage. Animations and drawing on Linux works fine, even when the application is in the background, as they happen on another thread (I suppose that there will be trouble for M1/M2 laptops).

The downside is that it causes the main thread to hang when no events occur (not moving the mouse over it, not resizing etc.) meaning that any window events that occur in the function queue have to wait until the next time it receives events from glfw. For example, pressing ctrl+c in the terminal doesn't close the window until the window is focused again and I assume that the same applies for developer code trying to close the window when it isn't focused. If there is some sensible way to fix that problem (running functions on the main thread while we are waiting for events, waiting for events on a different thread, etc.), we could get a hugely more efficient runloop.

Examples

A simple log printout:

Before (always runs)

Screencast from 2023-08-19 14-29-01.webm

After (only runs when events are posted)

Screencast from 2023-08-19 14-29-57.webm

dweymouth commented 1 year ago

@Jacalz This is awesome! This is pretty much the last bug that is keeping me building Supersonic against a fork of Fyne (I've slowed down the main loop to 10 fps when no user input is occurring to reduce, but not eliminate, the background CPU problem).

You could look into glfwPostEmptyEvent as a way to wake up the main thread when needed. For M1 macs I feel we'd need a way to switch temporarily back to polling whenever an animation is running since the animations also share the main thread. But otherwise just waking up the main thread when user code calls a Refresh, or we receive a Ctrl+C or other signal, just might work!

Seems like it might be tight to get it in 2.4.0 but maybe we can figure out a way to solve this in 2.4.1

Jacalz commented 1 year ago

I'm glad you liked it @dweymouth. Are you sure about animations needing the main thread? The videos above are using the animation tab with the checkbox moving using an animation.

dweymouth commented 1 year ago

On M1/M2 Macs only animation needs the main thread, unless it's changed recently (I don't think so). I remember that the only thing I'm losing in Supersonic with the 10fps main thread is loss of button tap animation smoothness on Apple silicon macs.

Jacalz commented 1 year ago

Ah, I see. I thought you were talking about non M1/M2 Macs given that the drawing happens on the main thread (as far as I know) and that seems like a bigger problem. Anyway, I put together a messy POC (I know that it doesn't compile for WASM, the CTRL+C is support is glued on, etc.) in https://github.com/Jacalz/fyne/commit/6b2a23721a1a2f7c4536b601069b380a85254600 using your glfw.PostEmptyEvent() and it seems to work quite well here on Linux :)

Jacalz commented 1 year ago

You might get that running on M1/M2 by adding a glfw.PostEmptyEvent() to runOnDraw() like I did for runOnMain() but I can't guarantee anything as I don't have access to the hardware ;)

andydotxyz commented 1 year ago

You should - we have an M1 device on the cloud - DM if you don't have login details.

Jacalz commented 1 year ago

Absolutely. I know about the M1 device but mostly meant that I hadn't tested it on the hardware. I'll send you a DM as I realize that I haven't used it before. I'm still not sure if my approach is a good one so we'll have to see what comes from it :)

Jacalz commented 1 year ago

The good news are that my proof of concept runs about just as good on M1 (without modifications) as it does on my Linux box. The bad news are that the implementation is buggy on both platforms. One window seems mostly fine but opening, hiding and closing another window seems slow, buggy and flickery. I'm afraid that my solution for the problem isn't the best one but it does show that it at least might be possible to solve this using glfw.WaitEvents().

All of my work can be found here: https://github.com/Jacalz/fyne/tree/poc-glfw-waitevents

Jacalz commented 1 year ago

Hmm. I am seeing the same flickery windows on develop now. Will have to track it down tomorrow but I suspect that there might be another bug at play there.

EDIT: Yes, it seems like a bug on latest develop.

Jacalz commented 1 year ago

I have opened https://github.com/fyne-io/fyne/pull/4173 as a draftnow. It is what I would consider as good of a solution as I can think of. There are some problems to sort out but most of the quirks with my POC have been rectified with the slight complication that the CI tests crash and Wayland support is broken for some reason but that's a different story. It seems to be working fine during local testing on X11 :)

0-issue commented 1 year ago

The simple hello world application in fyne's README.md: https://github.com/fyne-io/fyne#getting-started executes 11 million instructions per second when user is not doing anything. My system is macOS Sonoma, and have installed and built using the steps mentioned on the README.md page. Similar hello world example on gio executes 0 instructions per second while idling: https://gioui.org/doc/learn/get-started. At the moment I am considering whether to invest building UIs in fyne or gio. Is there an inherent tradeoff? Like, will fyne consume more than 0 instructions per second by design?

dweymouth commented 1 year ago

There is progress being made on this issue - see #4173 by @Jacalz above.

andydotxyz commented 1 year ago

executes 11 million instructions per second

Are you able to be more specific about what you have found so it can feed into the work we are doing? I don't understand how any tool can operate 0 operations per second unless suspended - but the Fyne loop should (60 or 120 times per second) check if anything has changed (simple Boolean operation) and do nothing further if idle...

0-issue commented 1 year ago

@andydotxyz I am new to go. But afaik system signals/system events can be mapped on channels. Perhaps some main event loop in fyne can have a select statement waiting for events on channels instead of polling 60/120 a second. That is what gio seems to be doing. On maOS, top command gives you instructions executed per second. You can try running something like top -pid <pid_of_program_being_observed>. On linux, you can find similar information from procfs, as you would know. Similar change would work equally well in other situations too where you could be busy waiting.

On C side of things, select syscall has similar behavior... The application waiting for an i/o event registered in select does not consume any CPU time. That's the difference between polling vs event driven.

EDIT: If wanting to wait on N events, where N is unknown at compile time, or N is large, or some elements need to be masked/unmasked every iteration, or N changes from iteration to iteration, use https://pkg.go.dev/reflect#Select instead of select-case-structure. It currently supports registering and waiting on up to 65536 events at a time.

Jacalz commented 1 year ago

Like @dweymouth said above, that is basically what the change I was working on (took a break but will resume it sometime soon) was doing.

andydotxyz commented 1 year ago

Yup work is ongoing. We can't rely solely on events like you say Gio does because we are stateful and we have to watch for state changes internal to the GUI structure. To avoid the developer having to fire events all the time we monitor. But like I say it should be closer to 60 operations a second not millions.

0-issue commented 1 year ago

@andydotxyz

we have to watch for state changes internal to the GUI structure

What would be an example? Sorry, I am new to fyne and do not know much about its architecture. In my mental model of how GUI framework could work, it seems possible to entirely quiescent the CPU activity of the application's process when nothing is happening.

To avoid the developer having to fire events all the time we monitor.

Some developers might actually appreciate an API where they interact with fyne over channels in an event-driven manner, perhaps saving power.

60 operations a second not millions

My comment above mentioned 11 million instructions per second, not operations.

dweymouth commented 1 year ago

but the Fyne loop should (60 or 120 times per second)

I actually don't think we even need to do this, since the only reason things would ever need updating is if

there is user input event from the OS
there is a running animation (we can track this internally)
the developer calls a Refresh() on a CanvasObject or a Start on an animation

Am I missing anything? Couldn't # 3 send a value on a channel and the main loop can literally be asleep in a select statement until an event from any of the channels arrives?

andydotxyz commented 1 year ago

Internally to Fyne it would be quite reasonable to move the dirty flag to a channel so we don't have to check a Boolean state each frame (though I'm not sure it's that slow?)

We do need to be careful about main thread though - depending on the OS / GLFW mode of the main thread stops polling then events may not be seen. Of course this could become a circular definition and perhaps solving one allows solving the other?

fyne-io / fyne