bitwiseworks / qtwebengine-chromium-os2

Port of Chromium and related tools to OS/2
9 stars 2 forks source link

High CPU load on certain web sites #26

Open dmik opened 4 years ago

dmik commented 4 years ago

This is a continuation of #25 in some way. That ticket was about general high CPU load (i.e. while idling even on an empty about:blank page) but that general problem was solved. This ticket is about high CPU load on particular web-sites.

One known example is http://cnn.com. It appears to work at first but then some thread starts eating up 100% of a single core and basically blocks web page rendering so that, while the Qt application itself remains responsive, the web page is not — it cannot be scrolled, nothing can be clicked and so on. More over, any attempt to supply the web engine with a new URL also remains w/o any response. Only restarting the app helps. And sometimes even closing the app hangs for a long period of time (can be killed though).

Another example is http://www.qt.io. It even works at the beginning (including quite a complex HTML5 animation) but then gets stuck just like cnn.com.

dmik commented 4 years ago

Monitorthreads reveals that TID 5 is spinning and looking at its stack trace in the debugger indicates that it's a Chromium's select thread (via libevent). It's at least some hint to start with. I also need to try the debug version as its stack trace may be more informative due to optimizations turned off.

dmik commented 4 years ago

The debug build doesn't give me much more on TID 5 but an assertion on TID 12:

[2756:12:1016/184327.834000:FATAL:local_storage_cached_area.cc(584)] Check failed: found != ignore_key_mutations_.end(). 

Here is its stack trace (attaching as a screen shot as ICSDEBUG can't export such things, unfortunately): image

dmik commented 4 years ago

Analyzing the above threads running event queues, I don't see any broken sockets constantly triggering a select wakeup or something like that (so my initial guess was not correct). I see that everything works quite fine on simple sites but on heavy sites the GUI thread (TID 1) starts constantly waking worker threads up with base::MessagePump::ScheduleWork (to cause the thread do some pending work) which writes to a special wakeup pipe to abort select and makes these threads run in a tight loop w/o entering the wait state for many-many cycles. This is what leads to a 100% CPU load on one core.

Interestingly enough, if I hit the title bar with a mouse, this bombing from TID 1 stops and CPU load goes away. This rings some bells... I need to think/dig.

dmik commented 4 years ago

Not seeing much so far. And holding the title bar does not always solve the problem. One stable observation is that at some point the main GUI thread (TID 1) starts pinging TID 5 every 250 ms keeping it quite busy. 250 ms is not that much actually but TID 5 seems to be doing quite a lot of work at each ping. The source of this ping is not yet tracked. Most likely it's some animation timer or such.

I guess I have to fix the broken tracing facility first (see https://github.com/bitwiseworks/qtwebengine-chromium-os2/issues/25#issuecomment-707791242) to be able to tell more about what's going under the hood.

dmik commented 4 years ago

Another problem I'm seeing is that sometimes TID 5 dead locks. It enters some endless wait while some other thread is waiting for it instead. This dead lock seems to happen only when CPU load gets high. Some message flow is not entirely right somewhere.

dmik commented 4 years ago

I finally know why tracing does not work. The debug build revealed this fatal assertion:

[37739:5:1029/203040.657000:FATAL:tracing.mojom-shared.h(633)] The outgoing message will trigger VALIDATION_ERROR_UNEXPECTED_INVALID_HANDLE at the receiving side (invalid perfetto_service in ConnectToTracingRequest struct).

And according to BUILD.gn, building perfetto_service is disabled in Qt. Seems to be done by these two commits:

As far as I get it, because the Qt guys thought it's too bogus. I may try to enable it but no guarantee that it will work... Note that in later Chromium drops used by more recent Qt, the perfetto code is there and built, however.

dmik commented 4 years ago

The above commit fixes tracing! Now I can do both startup tracing and DevTools (as described in https://github.com/bitwiseworks/qtwebengine-chromium-os2/issues/25#issuecomment-707791242). I will try our bottlenecks there.

dmik commented 4 years ago

I can't make it hang with tracing enabled — it just works.... This really smells like a race condition somewhere.

dmik commented 4 years ago

But in general tracing is very useful — it allows to show low-level bottle necks and such. Also, I know now that TID 5 that is the busiest one when it hangs is Chrome_IOThread. Supposedly, it does all network I/O or such. The second busiest thread (TID 12 or TID 13) is Chrome_InProcRendererThread which apparently does HTML rendering. I need to inspect Chrome_IOThread on some higher level to understand why and when it hangs (and eventually causes dead locks).