youtube.com takes really long time to load

dmik commented 6 years ago

It appears that sometimes opening https://youtube.com takes minutes before anything appears on the page — in the mean time you only see the progress ring spinning and something like Read www.youtube.com, Connecting to s.ytimg.com... and alike on the status tooltip at the bottom left corner of the page. You may make it work faster by reloading the page several times with Ctrl+R (in this part it's similar to #242).

All 45.9.0 builds as well as 45.5.0 from May 2017 are affected while 38.x and earlier builds seem to be not.

My current guess is still that it has something to do with the network connection. However, it looks like it's on the Firefox side, not on the TCP/IP stack side. It might be some security issues, missing certificates or such and delays in the connection caused by them. At least I sometimes see a lot of errors in JS regarding certificates. I need to study it closer. The problem is, as usual, that the failure is irregular and once it starts working, it's quite hard to make it fail again.

dmik commented 6 years ago

I think I know what it is, MOVAPS is SSE (1), and I'm only disabling SSE2 (which is MOVDQA and friends — i.e. where it was crashing before). It means that -mno-sse2 actually works causing GCC to emit SSE1 instead. I will try a build with SSE1 disabled as well (overnight).

dspiatkowski commented 6 years ago

@dmik OK, glad you got it, because I was attempting to update the ticket on three separate occasions but FF would simply trap. I found the following in the 'Failing Instruction' section:

Try No 1: 111F6975 >MOVAPS XMM0, DQWORD [ESP+0x60] (0f284424 60)

Try No 2: 111F6975 >MOVAPS XMM0, DQWORD [ESP+0x60] (0f284424 60)

Try No 3: 111F6975 >MOVAPS XMM0, DQWORD [ESP+0x60] (0f284424 60)

...and all of these match what you highlighted about that being a SSE instruction.

dmik commented 6 years ago

The new test build 4 (where everything is built with -O3 -mno-sse -mno-sse2 -march=pentium4) is available as http://rpm.netlabs.org/test/ff45_9_0_t4.7z. It seems to not crash at unaligned SSE/SSE2 arguments here any more, neither at startup nor when e.g. using Gmail (where it would inevitably crash otherwise). It also doesn't crash on exit here (see also #269). Please test. Please also test its overall performance and such (also with relation to #265, i.e. different layout.frame_rate values). We have to decide which options to use for GA2.

lerdmann commented 6 years ago

1) I just tried http://rpm.netlabs.org/test/ff45_9_0_t4.7z. :Works fine here, no hangs no crashes, no traps on exit whatever. 2) everything but "layout.frame_rate" = 0 is bad. I have 8 cores, therefore that cannot be the problem. If I were you I would not hesitate to change whatever is necessary in "Native..." function to make this 100% CPU usage ratio go away. After all a "Native..." function is specific to the target platform.

dmik commented 6 years ago

@lerdmann Lars, thanks. Well if we leave it as 0 we will have to get along with 100% CPU load in minimized state (or dive deeper in an attempt to properly solve this but I'm afraid there is no time for this now).

@lerdmann And BTW how setting the frame rate to, say, 6000 is bad for you, exactly?

dspiatkowski commented 6 years ago

@dmik I concur with Lars' findings. Version T4 loads up fine here, no issues, shutdown is clean, Gmail use is OK as well. The overall CPU use seems lower, for example right now I have 4 separate windows open, about 12 tabs. Gmail in one fulltime, yet the CPUMON shows only a consistent 1-5% utilization on each core (5 total). So on that note it seems to be an improvement, but very little runtime at the moment, so this is very preliminary.

Regarding the YouTube issue. Dmik, I understand your point, my conclusion is that the combination of all of the above flags/optimizations/etc appears to somehow cause the underlying timing issue. However, having said that, all I can report are the symptoms I see. This T4 build actually does play native VP9 codecs, although they are somewhat choppy. In comparison to the previous one, which did not play them at all - despite numerous attempts on my part, the results I'm seeing are a significant improvement.

As in previous tests, I will continue to test with T4 and will report out my results.

an64 commented 6 years ago

Same results here , no hangs Only one thing - plugins not working at all , just black area , killing plugin-container.exe giving standard message "plugin crashed"

dmik commented 6 years ago

Ok guys, please test another build: http://rpm.netlabs.org/test/ff45_9_0_t5.7z. The only difference from T4 is that I use -Os again for everything but JS (to prove that it's -O3 and -march=pentium4 w/o -mno-sse -mno-sse2 which are responsible for crashes). I also want to check how much -Os affects performance compared to -O3 for XUL. Given that with -Os the size of XUL (with debug info removed) drops from 40 MB to just 30 MB, we may consider using it if the performance drop is not significant because saving 10 MB in our tight shared memory arena is not that bad, actually.

@an64 well, plugin-container.exe should just not be there. It doesn't work right ATM (there is a respective ticket). IIRC, I had to remove it from RPM-based builds because there turned to be the simplest way to completely disable out-of-process for plugins or such. So please remove it and see how it goes. Simple plugins should work then. I made them work this way for 45.x back then at least. If not, please find the latest release where they still do.

dspiatkowski commented 6 years ago

@dmik Build T5 feedback: working fine here, no crash, no issues to report so far. Compared to builds T3 & T4 there is a noticable improvement in YouTube VP9 playback, I would say it is on-par with the previous working test release, which I think was T1 b/c T2 was trapping right away (sorry, I should have been logging these things here, or labelling my ticket updates accordingly - like I'm doing with this one).

dmik commented 6 years ago

@dspiatkowski Interesting. T1 was also -Os. Are you sure it's not an observer's effect? -Os is optimize for size which should be generally slower than any other type of optimization (for the price of a better in-memory footprint). Do you have any numbers indicating that Youtube behaves better on T5 than on T4?

dspiatkowski commented 6 years ago

@dmik Well, I am basing my YouTube feedback on how actual playback proceeds. It is either smooth, no video and/or audio drop-outs, or stutters, hangs, etc., or it is not. I have no hard numbers to use, other than the few screenshots of YouTube "nerd data" I have captured in the past as we were looking at potential network speed issues. All of these pointed to a significant amount of network speed availability.

Would capturing the YouTube "nerd data" for the very same video in both T4 & T5 help in any way?

an64 commented 6 years ago

@dmik I'm dont understand... In #229 you say that plugins can only work in oop mode and you make a lot of commits to support this , isnt right?

an64 commented 6 years ago

t5 here shows slightly lower cpu usage on vp9 and h264 playback, no hangs But sound on any version (38, 45 all builds) somethimes drops, maybe some thread needs higer priority?

lerdmann commented 6 years ago

Trying http://rpm.netlabs.org/test/ff45_9_0_t5.7z: with the Youtube web page open, a main window resize takes an extended period of time (I did not try this with http://rpm.netlabs.org/test/ff45_9_0_t4.7z, maybe I should).At least, initially. Sluggish scrolling behavior. Youtube video playback is sluggish, moving the mouse cursor over the Firefox window will get the video out of sync with the sound (but I think it has been this way for quite some time). I am using "layout.frame_rate" = 0. But no hangs or traps or the like, neither during normal operation nor on exit. By the way, setting "layout.frame_rate" to 1200 still makes Firefox "cycle" if it is minimized.

dmik commented 6 years ago

@an64 hmm I refreshed my memory and you are right, thanks for popping it up. Plugins should generally work in OOP mode now and no reason to remove/disable plugin-container.exe anymore. At least it was the case when I closed that issue. Can you please tell me what is the latest release where plugins work for you?

dmik commented 6 years ago

@lerdmann remember t4 and t5 differ only in -O3 vs -Os for XUL.DLL (all code except JS). Please test t4 then. Though I don't think these options make that much difference so please make sure you don't have some weird frame_rate value (and restart FF each time you change it — it might not pick it up everywhere on the fly). And why you still see high CPU load when minimized and frame_rate != 0 is also a puzzle since nobody else is seeing that.

dmik commented 6 years ago

@dspiatkowski re new YouTube nerd data, if you see any significant difference between the builds, then yes, post it.

@an64 regarding your sound issues, I doubt it has anything to do with FF per se. Please try to install the latest UNIAUD from Netlabs — there are reports it helps with sound in FF.

lerdmann commented 6 years ago

1) yes, http://rpm.netlabs.org/test/ff45_9_0_t5.7z is not significantly worse than http://rpm.netlabs.org/test/ff45_9_0_t4.7z. Changing window size while a movie is playing takes a long time to finish for both versions 2) I always close FF when I change "layout.frame_rate". I realized right away that apparently this value is only read once on program start. 3) Specifying anything but -1 for "layout.frame_rate" will indeed eventually set all threads to "Blocked" on minimizing the window. But that apparently takes some time to happen. Strange. 4) Specifying 10000 is equivalent to specifying 0 indeed. Typing something into an entry field is mostly ok (apart from some occasional blocking).

I wonder what "layers.offmainthreadcomposition.frame_rate" variable can do for us. It is set to -1.

I think FF is much too greedy in releasing its threads. Why would you want to handle ANY messages if the program is in the background ? I think that this WinPeekMsg is really counterproductive.

dryeo commented 6 years ago

On 05/09/18 10:03 AM, lerdmann wrote:

I think FF is much too greedy in releasing its threads. Why would you want to handle ANY messages if the program is in the background ? I think that this WinPeekMsg is really counterproductive.

Notifications. Here for downloads finishing or Chatzilla conversations mentioning me by name cause SM to display a notification on the lower right of the desktop. There's other notifications that are supported on other platforms.

dmik commented 6 years ago

@lerdmann okay, I see. Thanks for testing! layers.offmainthreadcomposition.frame_rate is irrelevant for us ATM as we have OMTC disabled for now (see #200 for details).

Regarding "greediness". This is how the Presentation Manager is designed. A PM application running a message queue is obliged to process all incoming messages as fast as possible in order for the whole PM desktop to function properly. There are many system messages which require immediate processing by the receiver (and that's besides application-specific notifications mentioned by @dryeo). If an application needs more than a dozen of milliseconds to process some message, it is supposed to do so asynchronously WRT reading and processing other incoming messages. And this is where the FF problem actually relies. It 1) emits too many messages 2) takes too long to process some of them synchronously (i.e. on the same thread that is supposed to process other incoming messages). And this negatively affects all the PM. While in theory one may indeed blame the PM design for that, it's what it is and we don't (and won't) have a different PM (and many other platforms have similar requirements). So it's FF which needs to behave properly here. It used to do so more or less but a lot of things have changed in it since then and modern platforms offer better parallelism and less strict requirements which FF seems to utilize w/o caring about older systems that don't. And here lies a fundamental problem as there might be just too many things to change in FF to make it behave a native PM citizen again — so many that it might be equivalent to writing some of its subsystems from scratch (which is apparently beyond our resources given the general complexity and quality of the FF codebase).

dmik commented 6 years ago

One thing that FF surely employs here is hardware acceleration of 2d rendering — something we miss almost completely on OS/2. On modern platforms some composition and paint operations take much less time than on OS/2 which means that a paint request (which always arrives on the main thread) is processed faster and so are all other upcoming messages. Another thing is that major platforms use OMTC for some time already — and OMTC is also about to transfer resource-greedy 2d rendering (especially if we take all those modern HTML5/CSS3 features into account) to other threads in order to reduce the main thread load and increase UI responsiveness. Things could have been slightly better if we had OMTC enabled on OS/2 but that's a task of its own.

dmik commented 6 years ago

While trying to debug & understand the complex (well, very complex, Id even say overcomplicated) FF messaging pipeline, I see one message that gets posted to the same window (most likely, the top one) at a very high rate (like 4-7 messages every 10 ms): 0xF588. This isn't any of the system messages but I still can't find where it originates from, it's not something WM_USER + XXX at least, all WM_USER based ones that I could find in the source don't exceed 0x403 (WM_USER + 3). Perhaps, its value is generated with WinAddAtom. I have to check that and find its origin. Most likely it's some wake up message. But still I wonder why it happens so often.

dmik commented 6 years ago

Ok, that was pretty simple. It's a special message used by nsAppShell to trigger native (PM) event processing from other (non-PM) event loops. Still, a question why that often. I need to analyze further.

dmik commented 6 years ago

Somehow I guess that this special event is a cause of high CPU load (and perhaps of #248 as well). There is a cross-platform logic like that (roughly): process native (in our case — PM) events for a maximum of 10 ms, then, when this maximum is reached, break this processing and let other Firefox events get processed. And there is also a check: if there are more native events pending when this happens, then another native processing cycle is be scheduled by posting this 0xF588 message to the native message queue. It turns out that under heavy load there is always some more messages to process so 0xF588 gets posted over and over from within its own handler at very high rates — until eventually there are no new pending messages within the next 10 ms. This gives extremely high CPU load when such "recursion" happens and the only way out of it is to wait for when the messages get sorted out (which heavily depends on the hardware and the complexity of the web content of course).

Other platforms have various means to prevent this from happening but in general it all looks too complex and hackish. They clearly overdid all the logic there. Given that there is also a merge with the Chromium message loop here (which also integrates with PM on its own), it becomes just a nightmare....

I will try to apply various hacks too to reduce the rate of this special message and see if it helps. I still don't fully understand the logic.

dryeo commented 6 years ago

On 05/10/18 06:13 AM, Dmitriy Kuminov wrote:

One thing that FF surely employs here is hardware acceleration of 2d rendering — something we miss almost completely on OS/2

Actually, if using SNAP, we do get hardware acceleration, using DIVE. Case in point, for the hell of it, I installed the latest beta of ArcaOS with SNAP instead of Panorama. Drawing PM programs is super slow, doing a window drag with animation on is very jerky and scrolling is very slow, a large text file, I can spin the mouse wheel and sit back and watch it scroll for minutes, with the PM blocked. With SeaMonkey, I get fast scrolling, quick page draws and such and they actually feel as fast or faster then with Panorama. Typing this message in TB is still slow though.

lerdmann commented 6 years ago

You get hardware acceleration for those chips that are supported by SNAP. Else you get the same old lousy support that GENGRADD has to offer which of course is worse than Panorama (if you have shadow buffering enabled in Panorama which is a trick to speed up things). DIVE does not mean HW acceleration. It just means that the device driver can write data ("draw") to the screen aperture directly instead of going through GPI calls.

On 12.05.18 08.16, Dave Yeo wrote:

On 05/10/18 06:13 AM, Dmitriy Kuminov wrote:

One thing that FF surely employs here is hardware acceleration of 2d rendering — something we miss almost completely on OS/2

Actually, if using SNAP, we do get hardware acceleration, using DIVE. Case in point, for the hell of it, I installed the latest beta of ArcaOS with SNAP instead of Panorama. Drawing PM programs is super slow, doing a window drag with animation on is very jerky and scrolling is very slow, a large text file, I can spin the mouse wheel and sit back and watch it scroll for minutes, with the PM blocked. With SeaMonkey, I get fast scrolling, quick page draws and such and they actually feel as fast or faster then with Panorama. Typing this message in TB is still slow though.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/bitwiseworks/mozilla-os2/issues/266#issuecomment-388533447, or mute the thread https://github.com/notifications/unsubscribe-auth/AHVM3HVzwIv1-SpczebMqpvIinIHRPzzks5txn43gaJpZM4ThgYz.

an64 commented 6 years ago

@dmik Can you please tell me what is the latest release where plugins work for you? flash not work with any release of 45, only in 38 Now i tested npwv (warpwision plugin) and it works with t5 and plugin-container.exe from t4 Flash not works, i have new odin , but maybe need for new npflos2.dll with your fixes, can i download it somwere?

an64 commented 6 years ago

@dmik regarding your sound issues, I doubt it has anything to do with FF per se. Please try to install the latest UNIAUD from Netlabs I'm not sure, vlc with kai interface plays well and sound drops only when cpu usage is near 100%

an64 commented 6 years ago

Setting layout.frame_rate=10000 gives menu sliders and input fields slow reaction , youtube video very slow framerate and youtube page contents never shown, only video

dmik commented 6 years ago

@an64 thanks for testing plugins. I will then just include plugin-container.exe into the archive. Re Odin, I believe libodin RPM 0.9.0-1 from netlabs-exp contains the latest version with the necessary fixes. Can you try it? If it's fine I will move it to rel. Re frame_rate, you results are really strange as they don't match what others are seeing. Are you sure you restarted FF etc?

dmik commented 6 years ago

BTW, applying Windows hacks does seem to help with #265 but doesn't help with the Gmail issue.

dmik commented 6 years ago

Seems that that this issue is more or less gone with the recent fixes. Closing this.

dmik commented 6 years ago

@an64 re flash, IIRC, it is only available to you if you have a Software Subscription from Mensys/ArcaOS or such.

bitwiseworks / mozilla-os2

youtube.com takes really long time to load #266