Xpra-org / xpra

Persistent remote applications for X11; screen sharing for X11, MacOS and MSWindows.
https://xpra.org/
GNU General Public License v2.0
1.97k stars 169 forks source link

Large browser windows with lots of widgets seem to be causing lasting blurriness #967

Closed totaam closed 7 years ago

totaam commented 9 years ago

Issue migrated from trac ticket # 967

component: core | priority: critical | resolution: worksforme

2015-08-26 03:27:52: afarr created the issue


Working with a 0.15.5 r10336 windows client (our build) against a 0.15.4 10209 fedora 21 server, I'm seeing some blurriness, especially when scrolling, which doesn't seem to be resolving itself after the scrolling stops.

I've collected client side logs (with -d client,regionrefresh) and server side logs (with -d encoding,regionrefresh).

Yes, I also collected some xpra info.

And, jut for fun... I'll include a screenshot of the bit of blurriness (it was considerably worse at points, but I did catch some... but was mostly seeing only with very large windows, so I won't try to include inline).

totaam commented 9 years ago

2015-08-26 03:32:40: afarr uploaded file ticket967_blurry-info2.txt (117.0 KiB)

xpra info for 0.15.4/5 blurry

totaam commented 9 years ago

2015-08-26 03:42:25: afarr uploaded file ticket967_blurry-xpra-client-d-client-refresh.txt (2806.9 KiB)

client side logs -d client,regionrefresh

totaam commented 9 years ago

2015-08-26 03:48:47: afarr uploaded file ticket967_blurry-xpra-server-d-encoding-regionrefresh.txt.zip (951.4 KiB)

server logs -d encoding,regionrefresh (so long I had to zip... but if something gives time index, they might be useful)

totaam commented 9 years ago

2015-08-26 03:50:08: afarr uploaded file ticket967_0-15-5-blurry-screen-shot.PNG (841.4 KiB)

screenshot of medium level of blurriness I was seeing with google-chrome (you've seen webkit) ticket967_0-15-5-blurry-screen-shot.PNG

totaam commented 9 years ago

2015-08-28 04:43:40: antoine changed owner from antoine to afarr

totaam commented 9 years ago

2015-08-28 04:43:40: antoine commented


About the debug flags:

  • client will include far too much to be useful
  • refresh is usually what you want when things aren't refreshing properly
  • regionrefresh is only for the video region - unlike refresh, it is meant to be constantly postponed since the video region should update regularly

But in this case, the refresh is not the cause of the problem, your xpra info shows:

window[3].size=(1226, 1635)
window[3].video_subregion.refresh_region[0]=(0, 85, 1226, 1550)

So almost all of the window is detected as a video region. The scrolling made us select the whole window as video (because there is no way to tell that it isn't: it is updating fast and in exactly the same area each time, just like video). And the heuristics kept this region afterwards, probably because so many things are animated on this page that they keep the "hit counter" high.

This means that the detection heuristics get it wrong: #410. So the debug flag that you want is probably (server-side): regiondetect.

totaam commented 9 years ago

2015-09-03 03:00:17: afarr commented


Tested some more with our set up. It seems easier to reproduce with our builds & configuration... will try to carve out a few minutes to test with encryption on with your builds to see if that might be the difference.

In any case, got the screen test to go blurry and remain so (msn.com, money tab... perhaps that page should stay blurry?) - will attach server logs with -d regiondetect for a portion of time with the text stuck at blurry, despite scrolling and mousing around the links and such, as well as a new xpra-info specific to our set up. I'll also attach a full-size screenshot of the page with most of it blurry, as well as an edited-for-size to link in-line.

The large-ish image sort of top-left is a rotating ad, which refreshes every... ohh, 1-2 seconds (?) ... and some of the other widgets seem to involve some motion (including a couple more ads that I didn't bother to capture in the screenshot). I suspect they may be responsible for just enough updates to keep the region detecting as video.

totaam commented 9 years ago

2015-09-03 03:01:18: afarr uploaded file ticket967_our-server_blurry-regiondetect.txt (669.0 KiB)

-d regiondetect server logs, our server, 0.15.5(ish)

totaam commented 9 years ago

2015-09-03 03:03:49: afarr uploaded file ticket967_our-server_blurry-info.txt (117.3 KiB)

xpra info, server side (of course) - our server (0.15.5 r10308 +/-)

totaam commented 9 years ago

2015-09-03 03:04:46: afarr uploaded file ticket967_full-size-portion-of-screen_blurry-page.png (942.1 KiB)

full size shot of page while blurry - our server/client ticket967_full-size-portion-of-screen_blurry-page.png

totaam commented 9 years ago

2015-09-03 03:07:41: afarr uploaded file ticket967_shot-of-page-while-blurry_rolling-banner-ad.png (318.9 KiB)

edited for in-line shot of blurry, to show widget concentration ticket967_shot-of-page-while-blurry_rolling-banner-ad.png

totaam commented 9 years ago

2015-09-03 03:08:50: afarr commented


[[Image(ticket967_shot-of-page-while-blurry_rolling-banner-ad.png)]]

totaam commented 9 years ago

2015-09-03 12:43:44: antoine commented


From your regiondetect debug log, we can see at regular intervals:

testing      current video region       rectangle[0, 79, 2098, 1306]: 100% in,   0% out,  93% of window, score=103
identify video: most=100% damage count={R(0, 79, 2098, 1306): MutableInteger(400)}

So it finds that 100% of screen updates happen in the region that previously identified as video, that's roughly 20 to 40 repaints per second! (the calculations run at most every second - less when there is not much happening on screen)

Not only that, but if you look at the actual paint events themselves (the format is simple: timestamp, X, Y, WIDTH, HEIGHT), ie:

(1441237975.382138, 0, 79, 2098, 1306), (1441237975.402772, 0, 79, 2098, 1306), (1441237975.428191, 0, 79, 2098, 1306)

All of the events that I can see actually repaint the whole of that area! (it's easy to see if you just search the log output for the string 0, 79, 2098, 1306, what is not highlighted is the rest - not much!) Usually you get smaller sub-areas, especially with players like flash that paint the screen in horizontal chunks, or youtube which repaints the video and the controls around it separately, but in this case it is all in one huge area!

You should be able to confirm that we are recording the correct values for paint events by logging with -d encoding then grepping the output for damage. But the code is unambiguous in this area: we record all non refresh events in the list you see in the regiondetect log.

So at this point I think I will close this bug as invalid. The region detect code gets it right, and we're doing remarkably well considering the heavy paint traffic.

It looks to me like the browser is needlessly repainting things that have not moved. It could also be that this particular page is triggering those events through bad javascript code. I found a good page which explains the browsers' rendering process: How Browsers Work: Behind the scenes of modern web browsers If the problem comes from the browser's rendering engine rather than the page, this needs to be fixed as it will consume huge amounts of CPU for absolutely nothing.

Edit: originally said 400 updates per second, which was incorrect. We keep the most recent 400 events, and the time difference from oldest to newest is roughly between 10 and 20 seconds.

totaam commented 9 years ago

2015-09-09 00:23:50: afarr changed status from new to closed

totaam commented 9 years ago

2015-09-09 00:23:50: afarr set resolution to invalid

totaam commented 9 years ago

2015-09-09 00:23:50: afarr commented


Looks like closing on your end is probably the right thing to do. We'll have to handle it on our end.

I'll take the liberty of closing.

totaam commented 9 years ago

2015-10-28 00:00:15: maxmylyn changed status from closed to reopened

totaam commented 9 years ago

2015-10-28 00:00:15: maxmylyn removed resolution (was invalid)

totaam commented 9 years ago

2015-10-28 00:00:15: maxmylyn commented


I have been volunteered to re-open this ticket. All jokes aside, I am seeing identical behavior in the latest Chromium (the open source variant - not the closed source Chrome):

  • Server is a Fedora 21 VM running trunk r11057 - built from source

  • Client is a Fedora 20 hardware machine running trunk r11057 - built from source

  • Server is launched with xpra start :13 --bind-tcp=0.0.0.0:2200 --start-new-commands=yes --start-child=xterm

  • Client is connected with xpra attach tcp:IP_TO_SERVER:2200

  • Once connected, chromium-browser --show-paint-rects is launched.

  • With Chromium, navigate to Ebay (easiest by far to reproduce behavior), and enter a search term (for reference, I just look for VW Super Beetles)


From there, you can do two things to see the blurry-ness stick around. You can click on a posting that will time out shortly (within 3 hours), or just sit there on the search results page(new!). With Chromiums paint debug enabled, you can see that the post titles refresh every second, and if on a posting is timed to match the clock ticking down.

The Heuristics here aren't catching (but trying if XPRA_OPENGL_PAINT_BOX=1 is set) these partial refreshes, and instead are repainting the whole window with h264...this causes the whole thing to become blurry. In some cases, it does come in clear; but that's about 30% of the time in my experience today.

I'll attach a screenshot of the behavior. If you would like logs, please let me know what flags you want and I'll attach them; as the repro is relatively simple.


As an aside, all this is very reminiscent of #410 and #596 from almost 2 years ago...speaking of which, my 2 year Anniversary here is coming up in a few short months.

totaam commented 9 years ago

2015-10-28 00:00:54: maxmylyn uploaded file Xpra_967_Full_Blurry.png (1526.4 KiB)

Sitting at an Ebay search query and seeing the blurry stick constantly. This behaviour appears to stick around indefinitely. Xpra_967_Full_Blurry.png

totaam commented 9 years ago

2015-11-03 00:50:49: afarr commented


Repro'd for logs, win client 0.16.0 r11118 against fedora 21 0.16.0 r11118.

Using steps listed above (comment:6), with a slightly different ebay search site... [http://www.ebay.com/sch/i.html?_from=R40&_trksid=p2047675.m570.l1313.TR12.TRC2.A0.H0.Xsuper+beetle.TRS0&_nkw=super+beetle&_sacat=0].

Scrolling up & down and mousing all over all the various widgets, even with the chromium paint boxes flashing regularly, wasn't sufficient to induce blurriness with a 1920x1080 window (give or take).

Re-sizing the window, however, seems to trigger the blurriness pretty reliably. (Shrink the window, then resize back to +/- 1920x1080).

I set the test up to be as narrow a window as possible, then blew it at the last minute... launched server with logs being captured, but no flags enabled... then connected client without logs in order to set up the blurriness.

I then disconnected client and re-connected to running session with logs enabled, -d client,regionrefresh (which will explain the disconnect/re-connect you'll see in server logs). I then used control channel to enable the server logs (and noticed that trying to pass two arguments failed... I'll make another ticket for that) - in my hurry I'm not sure if I enabled regionrefresh first, or encoding, but you'll see a few long seconds in the server logs with one enabled only, before I managed to enable the other.

I then resized the chromium window (smaller, larger), but then tried to get a screenshot... which means more logs than were probably strictly necessary. Oops.

In my hurry I also managed to blow the xpra info at the time, but I repro'd again without logs running and grabbed a new xpra info (window sizes might be a little different, but otherwise the info should be good).

Just wanted to give as much info as possible, so you'll be able to ignore as much superfluous logs as possible.

Also, ran with --desktop-scaling=off, I'll attach logs and new screenshot (the repro done my maxmylyn was on a particularly low end client machine, wanted to be sure that wasn't the root cause, rather than just the reason it was so easy to repro)... and then I'll try again with scaling of 1.5 and 2, just to see if there are different results (I imagine there will be).

totaam commented 9 years ago

2015-11-03 01:24:07: afarr uploaded file ticket967-beetle-repro-screenshot.PNG (988.0 KiB)

one more repro screenshot, XPRA_OPENGL_PAINT_BOX=1, most of screen encoded h264, but only link areas updating, according to chromium paint boxes ticket967-beetle-repro-screenshot.PNG

totaam commented 9 years ago

2015-11-03 01:38:13: afarr commented


Interesting, even with --desktop-scaling=1.5, once I get this window to window[4].size=(1842, 952), I am able to make it blurry. Of course, with scaling at 1.5, this window is enormous on a 4K monitor.

Likewise, with the default scaling (which still seems to be 2 x 2 on a 4K), if I shrink, then stretch the window back to window[4].size=(1856, 977), I'm able to induce blurriness... though, that's pretty much fullscreen/maximized on a 4K.

Any other debug flags worth trying?

totaam commented 8 years ago

2015-12-05 11:17:32: antoine changed priority from major to critical

totaam commented 8 years ago

2015-12-05 11:17:32: antoine commented


@afarr: please re-assign ticket to me if you want me to take a look.

The log data is very very large, but it looks to me like we're doing the right thing, there are lots of samples that look like this:

damage(WindowModel(0xc00001), 5, 101, 1927, 34, {})
damage(5, 101, 1927, 34, {}) wid=3, scheduling batching expiry for sequence 1570 in 50.0 ms
damage(WindowModel(0xc00001), 5, 135, 1927, 34, {})
damage(5, 135, 1927, 34, {}) wid=3, using existing delayed h264 regions created 0.0ms ago
(... edited out, repeating all the way down the window, up to:)
damage(WindowModel(0xc00001), 5, 1019, 1927, 34, {})
damage(5, 1019, 1927, 34, {}) wid=3, using existing delayed h264 regions created 0.0ms ago
damage(WindowModel(0xc00001), 5, 1053, 1927, 13, {})
damage(5, 1053, 1927, 13, {}) wid=3, using existing delayed h264 regions created 0.0ms ago

So this is repainting most of the window in horizontal bands of 34 pixels high, from Y coordinate 101 to 1927. Sometimes the chunks are slightly bigger (40 or more pixels). And this is happening over 10 times per second.

So the heuristics then decide that we are dealing with video content and send it as such, using h264 (and rgb24 for the 1 pixel edges, if any:

process_damage_regions: wid=3, adding pixel data to encode queue (1x965 - rgb24), elapsed time: 47.2 ms, request time: 2.2 ms
process_damage_regions: wid=3, adding pixel data to encode queue (1932x1 - rgb24), elapsed time: 47.7 ms, request time: 0.1 ms
process_damage_regions: wid=3, adding pixel data to encode queue (1926x964 - h264), elapsed time: 48.3 ms, request time: 0.4 ms

The h264 encoder seems to settle for a quality setting between 60% and 70%, which is normal:

video_encode encoder: h264 1920x908 result is 909 bytes (128.1 MPixels/s), \
  client options={'pts': 9886, 'frame': 180L, 'csc': 'YUV420P', 'type': 'P', 'quality': 64, 'speed': 86}

And the video pipeline also uses the YUV420P colourspace conversion mode, which will cause some blurring already for areas that aren't just black and white.

All in all, I don't see much we can do here: I think we are dealing with these pixel storms caused by the browser's rendering engine as well as we can.

Raising because all 0.16.x tickets should be dealt with before the release.

totaam commented 8 years ago

2015-12-09 00:45:35: afarr changed status from reopened to closed

totaam commented 8 years ago

2015-12-09 00:45:35: afarr set resolution to worksforme

totaam commented 8 years ago

2015-12-09 00:45:35: afarr commented


We've been doing some testing, experimenting with the min-quality settings especially (also min-speed, though that seems to be having less impact).

We've noticed that raising min-quality from the default of 30 to 60 seems to help a lot, with relatively little impact on client responsiveness (unless latency misbehaves)... and that further raising to 80 seems to largely resolve blurriness issues for even the most awful widget-packed sites (though that setting seems to be safe to use only on a LAN, vpns or internet inconsistencies seem to easily lead to noticeably degraded responsiveness).

I think you're right, there's not much else to do until someone, somewhere, decides to do the work to make it easier to isolate video regions so that that information can be passed to the heuristics.

I'll go ahead and close this (and if I discover someone has done that work, I'll open an enhancement request ticket to add a heuristics-helper).

totaam commented 8 years ago

2015-12-09 02:26:44: antoine changed status from closed to reopened

totaam commented 8 years ago

2015-12-09 02:26:44: antoine removed resolution (was worksforme)

totaam commented 8 years ago

2015-12-09 02:26:44: antoine commented


As I mentioned before, increasing the min-quality to workaround browser page rendering issues is very wasteful. Don't do that.

This will end up compressing video using YUV444 instead of YUV420, which will use roughly twice as much bandwidth and twice as much CPU. Halving user density.

Apart from hints that can help us identify video regions, there are other things that may help:

  • trying different video encodings: some may use the new frames to raise the quality
  • fixing the specific websites through CSS or JS monkey patching
  • fixing the browser rendering engine (tweaking the engine configuration, modify the code, or again with just monkey patching)
  • trying with different display settings (maybe turning off opengl by removing +extension GLX in the Xdummy settings?) which may change how / how often the browser renders the page
  • raising the threshold for video region detection (currently requires more than 10 frames per second to switch to video, though it will then stick with that region for a while, even if it then updates more slowly - using a logarithmic scale) - so that we don't detect those pages as video, or to make us switch back more quickly

And one last idea: if there is nothing (or almost nothing) changing on screen, then we should be able to see a very high compression ratio on those frames. This could be used as a hint to make us raise the quality automatically.

totaam commented 8 years ago

2016-02-24 09:56:32: antoine commented


See also: #1135.

totaam commented 8 years ago

2016-04-08 01:23:23: afarr changed status from reopened to closed

totaam commented 8 years ago

2016-04-08 01:23:23: afarr set resolution to worksforme

totaam commented 8 years ago

2016-04-08 01:23:23: afarr commented


Well, we've been working on the browser to detect video regions. Can currently detect html5 and flash video regions, and the behavior seems better in those cases - but when we hit pages with a lot of widgets or sites that seem to try to trigger constant updates for no apparent regions, especially when there`s no region of video to communicate to the server explicitly, then we still seem to run into the issue.

Before I can test the behavior with any new xpra updates though, we'll need to extract our browser to make it portable enough to try with a more flexible server/client environment.

At this point, I'm inclined to close this ticket as having been an investigation without that video region detection to improve and open a new ticket with some better & more relevant details once we`ve finished up that work to make a browser with video region detection more portable.

I suspect the idea of raising the threshold for video detection, perhaps with a flag or environment variable, when using a browser (or other application) which can be relied on to detect actual video regions would be the next step.

Once I can actually play with the encodings or other code updates though, we'll see what happens.

Closing this for now though.

totaam commented 8 years ago

2016-07-21 11:36:21: antoine changed status from closed to reopened

totaam commented 8 years ago

2016-07-21 11:36:21: antoine removed resolution (was worksforme)

totaam commented 8 years ago

2016-07-21 11:36:21: antoine commented


Seems this is not fixed, see #800#comment:17 and #1265.

Note: r13056 now also allows the whole window to be a "video region" so we can apply the video settings to full screen windows.

This should not have been closed twice without a proper resolution, this is now a serious blocker for other features: #800, #1257, #1232.

totaam commented 8 years ago

2016-07-25 19:30:53: maxmylyn commented


Using a Fedora 23 trunk r13086 built from source server and client:

  • Switching tabs in Chrome does cause blurriness and delayed region paints. Using OpenGL paint boxes, I see that it's sometimes painting part of the window with h264 (I usually see this when watching a video in YouTube and switching tabs) and the rest with other encoders. When it paints the whole window with h264, I don't get the delayed region (partial painting followed by the rest of the window painting), but I do get a notable blurriness until the rest of the window paints with something else, not h264.

As per #1257 comment:5, this is a bug, and not expected behavior.

totaam commented 8 years ago

2016-07-25 19:43:18: maxmylyn commented


Interestingly, just running htop in an xterm, I see it painting with h264 when it updates, and with png between updates.

totaam commented 8 years ago

2016-07-26 09:38:29: antoine changed status from reopened to new

totaam commented 8 years ago

2016-07-26 09:38:29: antoine changed owner from afarr to maxmylyn

totaam commented 8 years ago

2016-07-26 09:38:29: antoine commented


As per #1257#comment:6 : typing quickly into a text box (like Trac tickets...) does trigger an h264 region.

[[BR]]

I see that it's sometimes painting part of the window with h264 (I usually see this when watching a video in YouTube? and switching tabs) [[BR]] That's a slightly different issue, and one that is much harder to fix: the video region code is fairly expensive to run, so we run it no more than every second. Also, we try to stick to a video region when we found one, to prevent video context thrashing which is also expensive.


When it paints the whole window with h264, I don't get the delayed region (partial painting followed by the rest of the window painting), but I do get a notable blurriness [[BR]] Did it use a video region for the whole window? If so, let's figure out why it did so we can tweak the code to avoid doing so, see comment:9 : how many repaint events are we processing when that happens? Maybe also "-d compress" of when that happens.

(PS: your comment link in comment:16 looks like it points to the wrong comment)

totaam commented 8 years ago

2016-07-26 18:54:23: maxmylyn commented


Upped server and client to r13086:

Found an interesting corner case, and I'll attach a screenshot. On some sites that have lots of widgets, gifs, etc etc, it can trick the heuristics into painting the whole window with h264. Using google-chrome --show-paint-rects helps show what's happening.

What's also interesting, is that after you get it into this state, switching tabs to something less egregious (like Trac), and interacting even a little bit with the site will cause the whole window to paint with h264. Once in this state, I'm not entirely sure how to get out of it. I'll attach a screen shot of me clicking into this text field to show what I mean.

totaam commented 8 years ago

2016-07-26 19:03:11: maxmylyn uploaded file 967 corner case.png (824.1 KiB)

Note that the inline GIF and the giant sparkly SWAG (so annoying, but relevant) are the only things on the page updating, yet the whole page is being painted as H264 967 corner case.png

totaam commented 8 years ago

2016-07-26 19:03:42: maxmylyn uploaded file 967 corner case part 2.png (464.2 KiB)

notice that only the cursor is updating and the whole page is being painted as h264. 967 corner case part 2.png

totaam commented 8 years ago

2016-07-26 21:12:33: maxmylyn commented


re comment:17:

[[br]]

Did it use a video region for the whole window?

[[br]]

Yes it was using a video region for the whole window.

I'll spend some time to see if I can repro that easily and get relevant logs.

totaam commented 8 years ago

2016-07-29 21:19:06: maxmylyn changed owner from maxmylyn to antoine

totaam commented 8 years ago

2016-07-29 21:19:06: maxmylyn commented


Somehow this ticket slipped through the cracks yesterday while I was testing for something similar. Either way, I found a solid repro in Chrome.

Using a trunk Fedora 23 13131 client and server, started with:

xpra start :13 --bind-tcp=0.0.0.0:2200 --start-new-commands=yes --start-child=google-chrome (or just an xterm to launch Chrome)

and connected with:

XPRA_OPENGL_PAINT_BOX=1 xpra attach tcp:ip:port

  • Launch Chrome and navigate to a site that has no moving anything.
    • This trac ticket works pretty well, but something with collapsible fields like mobile Wikipedia or Reddit threads make it far easier to trigger - no scrolling needed
  • Fullscreen Chrome, or find a way to make it at least as large as 1080p
  • Scroll up and down enough to trigger h264 paints
  • Stop on a field with which you can interact
  • interact with said field

Upon doing so, interacting with tiny elements cause the whole window to be repainted with h264. In doing so, I notice that it tends to paint the whole window with h264, and then when it refreshes with a picture encoding, it seems to have missed a frame or so, so the window appears to jump slightly. You'll also get this behavior when it starts painting text fields with h264, causing the cursor to jump periodically, and text to appear and disappear. Needless to say, it makes typing difficult.

Of note:

If you try to repro on this page, interacting with the text field for comments after scrolling is an easy way to trigger.

I will attach the requested -d compress log.

In the log:

  • Enabled -d compress from control channel
  • Scrolled up and down to trigger h264
  • Clicked in and out of the Trac comment field to trigger full-window h264 paints
  • Disabled -d compress from control channel
totaam commented 8 years ago

2016-07-29 21:19:32: maxmylyn uploaded file 967 d compress log.log (23.2 KiB)

requested -d compress log