bitwiseworks / mozilla-os2

Mozilla for OS/2 and OS/2-based systems
Other
34 stars 9 forks source link

CPU usage is higher when Firefox is minimized #265

Open NeilWaldhauer opened 6 years ago

NeilWaldhauer commented 6 years ago

Using Firefox 45.9.0, with or without the new XUL.DLL from x3.7z, CPU usage on the default Firefox web page is low, in the few percent. When minimized, this usage jumps to 50-100% on the dual processor desktop computer, but to 99% on a slower single processor computer.

StevenLevine commented 6 years ago

@dmik are you sure about this? May analysis says mayWait is always FALSE when running minimized and is always TRUE when running normal.

The original markdown garbled my previous comment somewhat. When running minimized mayWait is FALSE as show by:

  esp     eip      pObj     mayWait
%0013f9bc 0d19bf8b 20243840 00000000
dryeo commented 6 years ago

I decided to attack this problem by a different route. Since 45.5 didn't show this behaviour, I decided to bisect the tree to find the commit that caused this. I was quite surprised that it was [PATCH] [OS/2] Temporarily disable usage of Vsync timers by default.

This is easily verified by going to about:config, changing "layout.frame_rate" to -1 restarting the browser and minimizing it. Can someone verify this as it was an unexpected finding.

dspiatkowski commented 6 years ago

@dryeo Yup! Pretty much all the CPU usage is now gone...the only time I'm seeing CPU spike is when loading a new page and it renders. Following this, the CPU usage goes to anywhere between 0-1% across the cores, but one of them does stay around 4-5%.

The unfortunate result is that this does bring back the dreaded stuttering of page scrolling.

Anyways, I will keep this modification in place until I get about a day's worth of use as usually by the time FF has eaten up about 1-1.5 Gig of RAM things start to really go "crazy"...LOL!

lerdmann commented 6 years ago

Yes, setting "layout.frame_rate" to -1 fixes this problem. But of course the resulting performance is abysmal.

lerdmann commented 6 years ago

Looks like setting "layout.frame_rate" to anything but 0 or 1 will fix the problem. How is this value used ? It feels like this is a direct input to "DosSleep" as 0 and 1 have special meaning for "DosSleep" (these values will lead to immediate rescheduling, 0 only to threads of higher priority waiting to run, 1 for any thread waiting to run). I am beginning to believe that if this value x >= 0 then "ProcessNextNativeEvent" is run with mayWait = 0. It returns immediately from WinPeekMsg, the outer loop does a DosSleep(x) and then "ProcessNextNativeEvent" is run again with mayWait = 0. If frame_rate == -1, ProcessNextNativeEvent is run with mayWait = 1 in which case it blocks on WinWaitMsg as it should.

lerdmann commented 6 years ago

Setting "layout.frame_rate" to 0 in Thunderbird works ok (no CPU load on minimizing Thunderbird). Maybe the behaviour can be "ported" to Firefox. But Thunderbird might be less performance critical than Firefox.

dryeo commented 6 years ago

You can try larger numbers such as 3000-6000. The whole thing seems buggy even on platforms where it is supposed to work and the -1 fallback is probably not really tested by Mozilla. See https://www.vsynctester.com/firefoxisbroken.html

lerdmann commented 6 years ago

For my understanding, the WinWaitMsg should ALWAYS be called, also after WinPeekMsg was called. That will ensure that if WinPeekMsg is done then the thread will be blocked. If a message is still in the loop, it will be handled by WinPeekMsg by the next invoation of "ProcessNextNative", if not, it will again block.

dryeo commented 6 years ago

Both SeaMonkey and Thunderbird don't seem to be affected the same way as Firefox but may be causing other CPU usage problems. Thunderbird can be very slow acquiring focus and SeaMonkey has problems with some extensions, in particular Chatzilla, as well as general slow JavaScript performance at times. Need to investigate once my current build is finished.

StevenLevine commented 6 years ago

Time to read some more code. Take a look at nsRefreshDriver::GetRegularTimerInterval(bool *outIsDefault) const in nsRefreshDriver.cpp. -1 does not exactly turn of VSYNC. It sets it to:

rate = gfxPlatform::GetDefaultFrameRate();

and assuming outIsDefault is not null:

*outIsDefault = true;

Somewhere, I have to suspect this has an effect on the eventual value of mayWait.

FWIW, we have

MessagePump.cpp:386
    bool didWork = NS_ProcessNextEvent(mThread, false);

which says mayWait is getting passed as false. What I am not yet seeing is where in the code path mayWait gets set to true. I will capture a process dump tomorrow and confirm that mayWait is really set to true in ProcessNextNativeEvent when layout.frame_rate is -1.

dmik commented 6 years ago

@StevenLevine yes, I injected printf in the code so I'm pretty sure as I get every call to ProcessNativeEvent logged. And during the normal run mayWait is often TRUE, yes, but not always. Depends on the application logic (which I have to fully understand yet).

Re layout.frame_rate, guys, you are resurrecting the old topic here. Please read https://github.com/bitwiseworks/mozilla-os2/issues/208#issuecomment-308842294 and also #220. It works as designed. -1 is effectively the same as setting it to 60 on OS/2 (which means send a special message to cause a repaint 60 times a second). 0 is the most greedy mode in terms of CPU usage as it means repaint each time anything is changed on the canvas (pretty much always) — and this is the default value on OS/2 as any other value gives "jerky" behavior. This value also matches what Firefox used to do before version 45 (with some limitations — they changed the rendering pipeline considerably to account for this new Vsync feature so while 0 restores the previous "smoothness" it also increases CPU load compared to 38 and before).

However, setting the frame rate to a higher value (1200 and above) still retains overall smoothness and makes the minimized state not eat all the CPU. Perhaps we should change the default from 0 to 1200 or higher. The problem here is that I have different reports of what value keeps acceptable smoothness. It seems to depend on the user's hardware and video drivers. The only proper way would be to properly use Vsync but according to #220, chances are low we can do it ATM.

Anyway, the difference in the minimized state behavior proves that something is wrong there regardless of the frame rate (given that in the normal state it works fine with frame_rate = 0). It should not cause paint events at all in this state. As a quick hack, we may set frame_rate to a high value by default but I don't know which value to use. Please report which ones work ok for you WRT smoothness vs CPU load (in normal mode, in minimized mode it drops CPU usage with any non-zero value from what I can tell).

Regarding the connection of mayWait to frame rate, I doubt there is one. But it may indirectly influence it so that when it's set to 0, there is always something to paint so no need to wait for a message. Or something like that.

StevenLevine commented 6 years ago

@dmik, with layout.frame_rate set to -1, process dumps show that the code is waiting in Win32WaitMessage. With layout.rame_rate set to the compiled in default of 0, this happens. This does not mean that nsAppShell::ProcessNextNativeEvent is never called with mayWait false, It means that it is not called with mayWait true often enough to avoid the high utilizatiion.

Hints to why this happens is in mozilla::ipc::MessagePump::Run.

When we are at 100% utilization mode, the stack shows we were called from.

13fa9c: 0b3d2770 = mozilla::ipc::MessagePump::Run + 68

which is

bool did_work = NS_ProcessNextEvent(mThread, false) ? true : false;

which we know will never call Win32WaitMessage

When we not in 100% utilization mode, the stack shows we were called from:

13fa9c: 6f2227b9 = mozilla::ipc::MessagePump::Run + B1

which is either

did_work |= aDelegate->DoDelayedWork(&delayed_work_time_);

or

did_work = aDelegate->DoIdleWork();

I will figure this out after brunch when I decode what

13fa7c: 6f1cac38 = NS_ProcessNextEvent + 30C99B (0001:003fac38)

really points to. I'm pretty sure I will find it contains mayWait related logic.

dmik commented 6 years ago

@StevenLevine must be that your dumps don't cover all cases due to timings or such. I double checked: regardless of what the frame_rate value is (I tried 1200, -1, 0, 60 which is equivalent to -1) — in all cases I see it being called with mayWait = TRUE which eventually ends up in a WinWaitMsg call. The only difference is how often that happens.

StevenLevine commented 6 years ago

@dmik, that's what I was trying to say above. My only claim is that -1 does something that allows WinWaitMsg to get called more often. However, it is a bit more complicated than this.

Since your change to all.js sets the default to;

user_pref("layout.frame_rate", 0);

nsRefreshDriver::GetRegularTimerInterval should set outIsDefault false and rate 10000.

However, using

user_pref("layout.frame_rate", 10000);

does not result in 100% utilization when minimized while

user_pref("layout.frame_rate", 0);

does.

dspiatkowski commented 6 years ago

@StevenLevine Steven, the example you provide about with the two different profile values appears to have no impact on CPU utilization I'm seeing here.

dmik commented 6 years ago

@StevenLevine I guess that's because some places explicitly check for 0 and don't block in such a case to comply with this so-called "ASAP" mode, just like nsRefreshDriver::GetRegularTimerInterval documentation asks. They do this e.g. via a call gfxPlatform::IsInLayoutAsapMode(). The most suspicious use is in CompositorVsyncScheduler's constructor. nsRefreshDriver uses it too to decide if it wants to start a refresh timer thread.

lerdmann commented 6 years ago

In mozilla::ipc::MessagePump::Run:

if: bool didWork = NS_ProcessNextEvent(mThread, false);

evaluates to true then the first: if (did_work) continue;

will immediately trigger exactly the same invocation (with "mayWait" = false). That makes no sense to me. Because it will lead to an endless tight polling situation.

dmik commented 6 years ago

@lerdmann It will lead to a polling situation only as long as there are messages in the queue. And in this situation immediate polling is the only right thing to do to avoid the interface freeze. If there are no messages (i.e. nothing is going on), it should break the loop and eventually wait from some other place (and this wait does happen from time to time, according to my logs). But this code (and all relevant parts you already mentioned) is the thing that needs to be studied in full to understand how it works. I hope there will be enough Firefox development time for me to finish this study (there's not that much man/hour left for this task unfortunately).

Re frame_rate, guys, could you please test how well it behaves when you set it to 10 000? This is the refresh rate Mozilla internally uses when you specify 0. But 0 also causes other shortcuts to happen (see https://github.com/bitwiseworks/mozilla-os2/issues/265#issuecomment-387235895) that must be a reason for abnormal CPU load when minimized. If smoothness is the same at 10 000, I will set it as a new default. It will still be a workaround but it's better than nothing.

dryeo commented 6 years ago

Still finding it interesting that I can minimize both SM and TB and get about 0.5% CPU with 3 out of 4 cores offline, I looked again at the differences between the apps. One is the theme, so I switched to LavaFoxV2, reset frame_rate to zero and found that now Firefox no longer has the high minimized CPU usage. Others should test alternative themes, https://addons.mozilla.org/en-US/firefox/complete-themes/ only a few are compatible with our version.

dspiatkowski commented 6 years ago

@dryeo Hmm, truth be told I never even considered the impact a selected Theme might have. So given your results I decided to test this out as well. I pulled two different themes: 'Post Modern Revisisted re-loaded' & 'Royale 2'.

So far I only have about a day's worth of runtime on the Modern theme. There does appear to be a lower amount of CPU utilization when in normal use, meaning that all my FF sessions are in the foreground and they are not minimized (I'm running XPager so the windows are spread across virtual desktops and almost never trully minimized), but it is barely perceptible...so def not a Day vs Night type of a difference.

However, when I explicitly minimize all the windows the normally present and continuous CPU spikes do in fact go-away. So yes, that confirms what Dave found with his LavaFoxV2 theme. What is even more surprising is that even though FF is now showing in Theseus as having consumed some 1.3G of ram, it continues to be functional. This is significantly different from the default theme where by this time I really have to kill the process and re-start the browser. So maybe this is a good sign?

I will switch to the Royal theme next.

guzzi-g5 commented 6 years ago

Same here, with post modern low cpu usage when minimized

abwillis commented 6 years ago

Since the suggestion of determining the best frame rate, I have been testing here. The best overall frame rate I have found so far is 30. Smooth scrolling and lowest CPU usage.

dryeo commented 6 years ago

On 05/10/18 07:05 PM, abwillis wrote:

The best overall frame rate I have found so far is 30. Smooth scrolling and lowest CPU usage.

Try playing a Youtube video. Really jerky here at 30, even at 6000 it seems slightly more jerky then zero.

dmik commented 6 years ago

As I mention in #266 applying some of the Windows hacks in nsAppShell seems to help with the minimized state here (though it might be simply an observer effect as there is no enough evidence). I will upload a test version within #248 as there is also something I found there.

dmik commented 6 years ago

Please test a build with nsAppShell hacks and with the starvation timeout changed from 10 ms back to 20 ms as it was in esr38 and before. For this you need http://rpm.netlabs.org/test/ff45_9_0_t5.7z and put a newer XUL from http://rpm.netlabs.org/test/x_t5_1.7z on top of it.

lerdmann commented 6 years ago

This new XUL.DLL (I had already been using the ff45_9_0_t5 package) does not change anything regarding yielding all threads on window minimization, at least the main thread remains in the "running" state. But I can now increase the "layout.frame_rate" to 10000 without any negative side effects. Typing into any browser entry field or entry line is fast.

an64 commented 6 years ago

@dmik t5 + xul t1 , framerate=0 still 100% cpu when minimized , youtubr page seems loading faster

About plugins : with new odin still no go, may i get new npflos2.dll to test , with fixes from https://github.com/bitwiseworks/mozilla-os2/issues/229#issuecomment-318223895 ?

an64 commented 6 years ago

@dmik t5 + xul t1 , framerate=60 - still same results, youtube page content dont load at all, video very slow, menus and input fields lag

dmik commented 6 years ago

@an64 strange. This t5 build + xul t1/t2 does not hog the CPU in the minimized state here, no matter what I do. Regardless of the frame_rate value. But still, can you test it with layout.frame_rate = 10000 to see if the minimized state will still hog the CPU for you?

I need some resolution to this ticket.

dspiatkowski commented 6 years ago

@dmik and @an64 This is actually similar to what I am seeing here, which is why my earlier comment that the latest changes did not match the FF38 behaviour. Granted, I am not seeing a steady 100% CPU utilization, just the continuing spikes, which do occassionally peg the CPU at 100%.

However...I noticed something else today as I pulled the t5_2 drop of the XUL.DLL to give that a try. It turns out that the gmail issue is gone, and that's awesome, because I keep that page open 24x7. This behaviour matches t5_1 as well, which I admit I somehow missed earlier (probably because most of my testing has always been done with a set of FF windows/sites open in order to keep 'load' consistent). I did however find that on any other pages where there are numerous dynamic images being shown, not movies, not video clips, but static images (gif, jpg, etc) which alternate in the same window space (advertising) that drives FF to consume massive amounts of CPU, basically if I throttle down to a single CPU core it stays pegged to 100%, otherwise, these are steady spikes across all cores which total about 100% at any point in time.

Here is an example of such as page: http://www.forfmjbodiesonly.com/classicmopar/

You will notice the advertising images on the right side of the page. When I close down this window the CPU spikes completely go away. Minimizing it does nothing, the spikes continue on. Minimizing ALL open FF windows has no effect, the CPU spikes continue. But, as soon as I close this window the spikes go away.

So at least on my system I think I have finally pegged the source of the CPU spikes. It is not just this single URL though, any other sites which show these types of alternating images behave similarly.

dspiatkowski commented 6 years ago

...I will add one more note to the above, re-testing this scenario with the alternate Theme ('Post Modern Revisisted re-loaded') no longer exhibits the CPU use while minimized problem. When the window causing the CPU usage (URL shown above) is minimized, the CPU utilization goes away. As soon as I bring it back to foreground the CPU utilzation comes back. This behaviour is very different from the standard theme, but I have no idea what goes into debugging a theme, does anyone else??? Or perhaps, who could evaluate how the theme definition impacts this?

dmik commented 6 years ago

@dspiatkowski let's not mix things here. My question is simple now: if t5 + x2 still has a problem of a 100% CPU spike once FF gets minimized (with the default theme) or not. And if it still has, if this spike only happens with frame_rate = 0 or also with frame_rate 10000.

Re the theme thing, I have no idea how a theme could affect it. It really smells like yet another timing issue where switching a theme is just a trigger for the problem, not the problem itself. I repeat that I can't reproduce any CPU hogging related to switching minimized state here any more.

lerdmann commented 6 years ago

t5 + x2 : 1) on minimizing FF, still has 100% CPU usage when frame_rate = 0 2) on minimizing FF, blocks all threads when frame_rate = 10000 (in short: error fixed) I have not tested any other value for frame_rate. With frame_rate = 10000 system becomes sluggish and typing in slack (or other entry fields like this one) becomes unbearingly slow and the system starts to drop typed characters. But I don't care about that too much. I have 8 cores and I seldomly minimize FF.

dmik commented 6 years ago

I checked http://www.forfmjbodiesonly.com/classicmopar/ here, I confirm what you say but it has no any relation to the minimized state. I.e. for me FF still behaves the same, no matter if it's minimized or not. I only can tell that when frame_rate is 0, this advertise animation eats more CPU when frame_rate is 10000. Which is kinda expected because there are special checks in the code to not block waiting for something to happen when frame_rate is 0.

So the question now is if we should leave frame_rate at 0 by default (and live along with occasional 100% CPU usage in the minimized state) or set it to 10000 and reduce this occasional CPU load by the cost of dropping some video frames in YouTube and other similar problems.

dmik commented 6 years ago

@lerdmann hmm, what you just said contradicts your previous comment where you state:

But I can now increase the "layout.frame_rate" to 10000 without any negative side effects. Typing into any browser entry field or entry line is fast.

I'm confused. Can you clarify? May be you mixed 1000 and 10000? As getting typed symbols dropped is definitely not acceptable. Here all is smooth at 10000 as well as at 0 (and YouTube playback is sluggish here anyway due to non-working audio).

lerdmann commented 6 years ago

Anything but frame_rate = 0 is a desaster on my system. I just tried 10 and that is as bad as 10000 (I am beginning to believe that the value is completely irrelevant if it is not 0 or -1). I suggest to keep it at 0.

dmik commented 6 years ago

@lerdmann it's not irrelevant :) Its the timer interval. If you have it set to 60, then the timer will be set to fire each ~18 ms. If it's 1, then it will fire each second. If it's 10000, it will be triggered immediately after the previous timer (i.e. ASAP). ANd here I can see a clear difference between, say, 1, 10, 60, 1000, 6000 and 10000. If you don't, it means that your system for some reason can't measure such intervals with a required precision to make any difference.

Do you have SET NSPR_OS2_NO_HIRES_TIMER=1 in your CONFIG.SYS?

dryeo commented 6 years ago

Experimenting with Firefox T5 and xul XT5_1, I find not much apparant difference between frame_rate=0 and frame_rate=10000. Both use too much CPU minimized with zero perhaps being better and youtube seeming a bit smoother at zero. Real numbers here. Using https://www.vsynctester.com/ I get about 25-31.5 frames a second with zero and about 22-28 fps with 10000 (both full screen). Using http://chromium.github.io/octane/ I get consistently higher numbers with zero, 10100-11100 while with 10000 9400-10700 with one unresponsive script popup for zero and 2 unresponsive script popups with 10000, both in typescript. Notes about octane, restart the browser and wait at least 30 seconds before running the test, run at least 3 times and larger numbers are better. All in all I prefer zero for frame_rate. This also seems the best with SM and TB.

dspiatkowski commented 6 years ago

@dmik OK, so if I understood the following comment of yours correctly, you are in fact seeing the same behaviour I am seeing here: "...I checked http://www.forfmjbodiesonly.com/classicmopar/ here, I confirm what you say but it has no any relation to the minimized state. I.e. for me FF still behaves the same, no matter if it's minimized or not...", correct?

The CPU spikes occur regardless of whether FF is in the foreground or minimized, that is what I'm seeing here. There is no difference whether I'm using frame_rate=0 or 10000 or 15000. The result is always the same.

Next point, follow-up to the 'SET NSPR_OS2_NO_HIRES_TIMER=1' question. I have this present in my CONFIG.SYS for some time now. If I leave this out and allow FF to use this HIRES_TIMER it results in a nearly non-responsive FF, meaning, unless I set frame_rate to something crazy like 120000 (correct 120K) the windows barely respond to my inputs, scrolling is terrible and gmail always throws up an error, see attached file for screenshot below:

ff45_9-gmail-timer_enabled-error_msg

Subsequently, the only viable way for me to run FF here is to disable the timer, set frame_rate to either 0, or something like 15K (which seems to help out a tad with the continuous CPU cycles).

My vote is to leave the default value of the frame_rate preference to '0' (zero), document the impact this has, and instruct the end-user to adjust accordingly.

lerdmann commented 6 years ago

Yes, I have SET NSPR_OS2_NO_HIRES_TIMER=1 in CONFIG.SYS.

dspiatkowski commented 6 years ago

TEAM, One more remark, something I just noticed in how FF behaves. If the TAB which is causing the CPU spikes is in the foreground and is moved to a bacground, meaning, in the same window if you open a new TAB, even if it's empty, it will cause the CPU spikes to completely go away, even with the default theme.

an64 commented 6 years ago

t5+xt5_2 Any value (-1,30,60,10000) gives the same "slideshow" effect as wrote 0 gives 100% when minimized even with one tab "about:blank"

an64 commented 6 years ago

And... yes , with "LavaFox V2" theme and rate=0 all ok, no cpu hog

lerdmann commented 6 years ago

I can confirm what an64 has stated: using the LavaFox theme and FF will set all threads to "blocked" when it is minimized, even when frame_rate = 0 is specified. Seems like the themes have the "capability" to make FF excessively poll the message queue when minimized (or not).

dmik commented 6 years ago

Okay, you convinced me. I will leave frame_rate=0 and make a notice in README.OS2 about it and about the theme thing as well.

@lerdmann I don't think that it's something theme-specific. It's message handling specific. Somehow the default theme generates more intensive message flow when minimized compared to other themes. And it's the intensive message flow that gives problems here.

dspiatkowski commented 6 years ago

@dmik I'm curious, just an idea here, but could we use something like the built-in FF 'Performance' monitor to better understand what is causing this CPU cycles use?

Here is what I have in mind: I captured 1min of the Performance data for the URL I previously posted (this is what shows consistent high CPU utilization). By including the platform specific data ('Show Gecko Platform Data' option) it gives us the insight into the codebase, and (I hope) a better understanding of the code path which is leading to the CPU spikes.

I've attached the result to the ticket, beware, unzipped the darn thing is about 70Meg in size. Attempting to capture more (2 mins) actually caused FF to crash when invoking the 'Save' results option.

Anyways, if this approach is feasible let me know and I'll do a matching capture but with the URL window minimized with the standard theme, and further on with different themes, since we do recognize the theme somehow causes FF to show different CPU utilization results. FF45_9-CPU_spike_performance-capture-1min.zip

lerdmann commented 6 years ago

@dmik: I know it's not directly related to themes and we already know the root cause. But it would require analysis of what exactly a theme can influence so that message processing is affected in the way we observe. There clearly is a correlation.

dmik commented 6 years ago

@dspiatkowski sure, this is one of the ways to go. And my work of last months was dedicated to that (see e.g. #264). However, it's still an unfinished task. And no time to work on that anymore during the current FF round.

@lerdmann Sure, there is a correlation.