Closed totaam closed 7 years ago
ticket967_blurry-info2.txt
(117.0 KiB)xpra info for 0.15.4/5 blurry
ticket967_blurry-xpra-client-d-client-refresh.txt
(2806.9 KiB)client side logs -d client,regionrefresh
ticket967_blurry-xpra-server-d-encoding-regionrefresh.txt.zip
(951.4 KiB)server logs -d encoding,regionrefresh (so long I had to zip... but if something gives time index, they might be useful)
ticket967_0-15-5-blurry-screen-shot.PNG
(841.4 KiB)screenshot of medium level of blurriness I was seeing with google-chrome (you've seen webkit)
About the debug flags:
- client will include far too much to be useful
- refresh is usually what you want when things aren't refreshing properly
- regionrefresh is only for the video region - unlike refresh, it is meant to be constantly postponed since the video region should update regularly
But in this case, the refresh is not the cause of the problem, your xpra info shows:
window[3].size=(1226, 1635) window[3].video_subregion.refresh_region[0]=(0, 85, 1226, 1550)
So almost all of the window is detected as a video region. The scrolling made us select the whole window as video (because there is no way to tell that it isn't: it is updating fast and in exactly the same area each time, just like video). And the heuristics kept this region afterwards, probably because so many things are animated on this page that they keep the "hit counter" high.
This means that the detection heuristics get it wrong: #410. So the debug flag that you want is probably (server-side): regiondetect.
Tested some more with our set up. It seems easier to reproduce with our builds & configuration... will try to carve out a few minutes to test with encryption on with your builds to see if that might be the difference.
In any case, got the screen test to go blurry and remain so (msn.com, money tab... perhaps that page should stay blurry?) - will attach server logs with
-d regiondetect
for a portion of time with the text stuck at blurry, despite scrolling and mousing around the links and such, as well as a new xpra-info specific to our set up. I'll also attach a full-size screenshot of the page with most of it blurry, as well as an edited-for-size to link in-line.The large-ish image sort of top-left is a rotating ad, which refreshes every... ohh, 1-2 seconds (?) ... and some of the other widgets seem to involve some motion (including a couple more ads that I didn't bother to capture in the screenshot). I suspect they may be responsible for just enough updates to keep the region detecting as video.
ticket967_our-server_blurry-regiondetect.txt
(669.0 KiB)-d regiondetect server logs, our server, 0.15.5(ish)
ticket967_our-server_blurry-info.txt
(117.3 KiB)xpra info, server side (of course) - our server (0.15.5 r10308 +/-)
ticket967_full-size-portion-of-screen_blurry-page.png
(942.1 KiB)full size shot of page while blurry - our server/client
ticket967_shot-of-page-while-blurry_rolling-banner-ad.png
(318.9 KiB)edited for in-line shot of blurry, to show widget concentration
[[Image(ticket967_shot-of-page-while-blurry_rolling-banner-ad.png)]]
From your regiondetect debug log, we can see at regular intervals:
testing current video region rectangle[0, 79, 2098, 1306]: 100% in, 0% out, 93% of window, score=103 identify video: most=100% damage count={R(0, 79, 2098, 1306): MutableInteger(400)}
So it finds that 100% of screen updates happen in the region that previously identified as video, that's roughly 20 to 40 repaints per second! (the calculations run at most every second - less when there is not much happening on screen)
Not only that, but if you look at the actual paint events themselves (the format is simple:
timestamp, X, Y, WIDTH, HEIGHT
), ie:(1441237975.382138, 0, 79, 2098, 1306), (1441237975.402772, 0, 79, 2098, 1306), (1441237975.428191, 0, 79, 2098, 1306)
All of the events that I can see actually repaint the whole of that area! (it's easy to see if you just search the log output for the string 0, 79, 2098, 1306, what is not highlighted is the rest - not much!) Usually you get smaller sub-areas, especially with players like flash that paint the screen in horizontal chunks, or youtube which repaints the video and the controls around it separately, but in this case it is all in one huge area!
You should be able to confirm that we are recording the correct values for paint events by logging with
-d encoding
then grepping the output fordamage
. But the code is unambiguous in this area: we record all non refresh events in the list you see in the regiondetect log.So at this point I think I will close this bug as invalid. The region detect code gets it right, and we're doing remarkably well considering the heavy paint traffic.
It looks to me like the browser is needlessly repainting things that have not moved. It could also be that this particular page is triggering those events through bad javascript code. I found a good page which explains the browsers' rendering process: How Browsers Work: Behind the scenes of modern web browsers If the problem comes from the browser's rendering engine rather than the page, this needs to be fixed as it will consume huge amounts of CPU for absolutely nothing.
Edit: originally said 400 updates per second, which was incorrect. We keep the most recent 400 events, and the time difference from oldest to newest is roughly between 10 and 20 seconds.
Looks like closing on your end is probably the right thing to do. We'll have to handle it on our end.
I'll take the liberty of closing.
I have been volunteered to re-open this ticket. All jokes aside, I am seeing identical behavior in the latest Chromium (the open source variant - not the closed source Chrome):
Server is a Fedora 21 VM running trunk r11057 - built from source
Client is a Fedora 20 hardware machine running trunk r11057 - built from source
Server is launched with
xpra start :13 --bind-tcp=0.0.0.0:2200 --start-new-commands=yes --start-child=xterm
Client is connected with
xpra attach tcp:IP_TO_SERVER:2200
Once connected,
chromium-browser --show-paint-rects
is launched.With Chromium, navigate to Ebay (easiest by far to reproduce behavior), and enter a search term (for reference, I just look for VW Super Beetles)
From there, you can do two things to see the blurry-ness stick around. You can click on a posting that will time out shortly (within 3 hours), or just sit there on the search results page(new!). With Chromiums paint debug enabled, you can see that the post titles refresh every second, and if on a posting is timed to match the clock ticking down.
The Heuristics here aren't catching (but trying if
XPRA_OPENGL_PAINT_BOX=1
is set) these partial refreshes, and instead are repainting the whole window with h264...this causes the whole thing to become blurry. In some cases, it does come in clear; but that's about 30% of the time in my experience today.I'll attach a screenshot of the behavior. If you would like logs, please let me know what flags you want and I'll attach them; as the repro is relatively simple.
As an aside, all this is very reminiscent of #410 and #596 from almost 2 years ago...speaking of which, my 2 year Anniversary here is coming up in a few short months.
Xpra_967_Full_Blurry.png
(1526.4 KiB)Sitting at an Ebay search query and seeing the blurry stick constantly. This behaviour appears to stick around indefinitely.
Repro'd for logs, win client 0.16.0 r11118 against fedora 21 0.16.0 r11118.
Using steps listed above (comment:6), with a slightly different ebay search site... [http://www.ebay.com/sch/i.html?_from=R40&_trksid=p2047675.m570.l1313.TR12.TRC2.A0.H0.Xsuper+beetle.TRS0&_nkw=super+beetle&_sacat=0].
Scrolling up & down and mousing all over all the various widgets, even with the chromium paint boxes flashing regularly, wasn't sufficient to induce blurriness with a 1920x1080 window (give or take).
Re-sizing the window, however, seems to trigger the blurriness pretty reliably. (Shrink the window, then resize back to +/- 1920x1080).
I set the test up to be as narrow a window as possible, then blew it at the last minute... launched server with logs being captured, but no flags enabled... then connected client without logs in order to set up the blurriness.
I then disconnected client and re-connected to running session with logs enabled,
-d client,regionrefresh
(which will explain the disconnect/re-connect you'll see in server logs). I then used control channel to enable the server logs (and noticed that trying to pass two arguments failed... I'll make another ticket for that) - in my hurry I'm not sure if I enabledregionrefresh
first, orencoding
, but you'll see a few long seconds in the server logs with one enabled only, before I managed to enable the other.I then resized the chromium window (smaller, larger), but then tried to get a screenshot... which means more logs than were probably strictly necessary. Oops.
In my hurry I also managed to blow the xpra info at the time, but I repro'd again without logs running and grabbed a new xpra info (window sizes might be a little different, but otherwise the info should be good).
Just wanted to give as much info as possible, so you'll be able to ignore as much superfluous logs as possible.
Also, ran with
--desktop-scaling=off
, I'll attach logs and new screenshot (the repro done my maxmylyn was on a particularly low end client machine, wanted to be sure that wasn't the root cause, rather than just the reason it was so easy to repro)... and then I'll try again with scaling of 1.5 and 2, just to see if there are different results (I imagine there will be).
ticket967-beetle-repro-screenshot.PNG
(988.0 KiB)one more repro screenshot, XPRA_OPENGL_PAINT_BOX=1, most of screen encoded h264, but only link areas updating, according to chromium paint boxes
Interesting, even with
--desktop-scaling=1.5
, once I get this window towindow[4].size=(1842, 952)
, I am able to make it blurry. Of course, with scaling at 1.5, this window is enormous on a 4K monitor.Likewise, with the default scaling (which still seems to be 2 x 2 on a 4K), if I shrink, then stretch the window back to
window[4].size=(1856, 977)
, I'm able to induce blurriness... though, that's pretty much fullscreen/maximized on a 4K.Any other debug flags worth trying?
@afarr: please re-assign ticket to me if you want me to take a look.
The log data is very very large, but it looks to me like we're doing the right thing, there are lots of samples that look like this:
damage(WindowModel(0xc00001), 5, 101, 1927, 34, {}) damage(5, 101, 1927, 34, {}) wid=3, scheduling batching expiry for sequence 1570 in 50.0 ms damage(WindowModel(0xc00001), 5, 135, 1927, 34, {}) damage(5, 135, 1927, 34, {}) wid=3, using existing delayed h264 regions created 0.0ms ago (... edited out, repeating all the way down the window, up to:) damage(WindowModel(0xc00001), 5, 1019, 1927, 34, {}) damage(5, 1019, 1927, 34, {}) wid=3, using existing delayed h264 regions created 0.0ms ago damage(WindowModel(0xc00001), 5, 1053, 1927, 13, {}) damage(5, 1053, 1927, 13, {}) wid=3, using existing delayed h264 regions created 0.0ms ago
So this is repainting most of the window in horizontal bands of 34 pixels high, from Y coordinate 101 to 1927. Sometimes the chunks are slightly bigger (40 or more pixels). And this is happening over 10 times per second.
So the heuristics then decide that we are dealing with video content and send it as such, using h264 (and rgb24 for the 1 pixel edges, if any:
process_damage_regions: wid=3, adding pixel data to encode queue (1x965 - rgb24), elapsed time: 47.2 ms, request time: 2.2 ms process_damage_regions: wid=3, adding pixel data to encode queue (1932x1 - rgb24), elapsed time: 47.7 ms, request time: 0.1 ms process_damage_regions: wid=3, adding pixel data to encode queue (1926x964 - h264), elapsed time: 48.3 ms, request time: 0.4 ms
The h264 encoder seems to settle for a quality setting between 60% and 70%, which is normal:
video_encode encoder: h264 1920x908 result is 909 bytes (128.1 MPixels/s), \ client options={'pts': 9886, 'frame': 180L, 'csc': 'YUV420P', 'type': 'P', 'quality': 64, 'speed': 86}
And the video pipeline also uses the
YUV420P
colourspace conversion mode, which will cause some blurring already for areas that aren't just black and white.All in all, I don't see much we can do here: I think we are dealing with these pixel storms caused by the browser's rendering engine as well as we can.
Raising because all 0.16.x tickets should be dealt with before the release.
We've been doing some testing, experimenting with the
min-quality
settings especially (alsomin-speed
, though that seems to be having less impact).We've noticed that raising
min-quality
from the default of 30 to 60 seems to help a lot, with relatively little impact on client responsiveness (unless latency misbehaves)... and that further raising to 80 seems to largely resolve blurriness issues for even the most awful widget-packed sites (though that setting seems to be safe to use only on a LAN, vpns or internet inconsistencies seem to easily lead to noticeably degraded responsiveness).I think you're right, there's not much else to do until someone, somewhere, decides to do the work to make it easier to isolate video regions so that that information can be passed to the heuristics.
I'll go ahead and close this (and if I discover someone has done that work, I'll open an enhancement request ticket to add a heuristics-helper).
As I mentioned before, increasing the min-quality to workaround browser page rendering issues is very wasteful. Don't do that.
This will end up compressing video using YUV444 instead of YUV420, which will use roughly twice as much bandwidth and twice as much CPU. Halving user density.
Apart from hints that can help us identify video regions, there are other things that may help:
- trying different video encodings: some may use the new frames to raise the quality
- fixing the specific websites through CSS or JS monkey patching
- fixing the browser rendering engine (tweaking the engine configuration, modify the code, or again with just monkey patching)
- trying with different display settings (maybe turning off opengl by removing
+extension GLX
in the Xdummy settings?) which may change how / how often the browser renders the page- raising the threshold for video region detection (currently requires more than 10 frames per second to switch to video, though it will then stick with that region for a while, even if it then updates more slowly - using a logarithmic scale) - so that we don't detect those pages as video, or to make us switch back more quickly
And one last idea: if there is nothing (or almost nothing) changing on screen, then we should be able to see a very high compression ratio on those frames. This could be used as a hint to make us raise the quality automatically.
See also: #1135.
Well, we've been working on the browser to detect video regions. Can currently detect html5 and flash video regions, and the behavior seems better in those cases - but when we hit pages with a lot of widgets or sites that seem to try to trigger constant updates for no apparent regions, especially when there`s no region of video to communicate to the server explicitly, then we still seem to run into the issue.
Before I can test the behavior with any new xpra updates though, we'll need to extract our browser to make it portable enough to try with a more flexible server/client environment.
At this point, I'm inclined to close this ticket as having been an investigation without that video region detection to improve and open a new ticket with some better & more relevant details once we`ve finished up that work to make a browser with video region detection more portable.
I suspect the idea of raising the threshold for video detection, perhaps with a flag or environment variable, when using a browser (or other application) which can be relied on to detect actual video regions would be the next step.
Once I can actually play with the encodings or other code updates though, we'll see what happens.
Closing this for now though.
Seems this is not fixed, see #800#comment:17 and #1265.
Note: r13056 now also allows the whole window to be a "video region" so we can apply the video settings to full screen windows.
This should not have been closed twice without a proper resolution, this is now a serious blocker for other features: #800, #1257, #1232.
Using a Fedora 23 trunk r13086 built from source server and client:
- Switching tabs in Chrome does cause blurriness and delayed region paints. Using OpenGL paint boxes, I see that it's sometimes painting part of the window with h264 (I usually see this when watching a video in YouTube and switching tabs) and the rest with other encoders. When it paints the whole window with h264, I don't get the delayed region (partial painting followed by the rest of the window painting), but I do get a notable blurriness until the rest of the window paints with something else, not h264.
As per #1257 comment:5, this is a bug, and not expected behavior.
Interestingly, just running
htop
in anxterm
, I see it painting with h264 when it updates, and with png between updates.
As per #1257#comment:6 : typing quickly into a text box (like Trac tickets...) does trigger an h264 region.
[[BR]]
I see that it's sometimes painting part of the window with h264 (I usually see this when watching a video in YouTube? and switching tabs) [[BR]] That's a slightly different issue, and one that is much harder to fix: the video region code is fairly expensive to run, so we run it no more than every second. Also, we try to stick to a video region when we found one, to prevent video context thrashing which is also expensive.
When it paints the whole window with h264, I don't get the delayed region (partial painting followed by the rest of the window painting), but I do get a notable blurriness [[BR]] Did it use a video region for the whole window? If so, let's figure out why it did so we can tweak the code to avoid doing so, see comment:9 : how many repaint events are we processing when that happens? Maybe also "-d compress" of when that happens.
(PS: your comment link in comment:16 looks like it points to the wrong comment)
Upped server and client to r13086:
Found an interesting corner case, and I'll attach a screenshot. On some sites that have lots of widgets, gifs, etc etc, it can trick the heuristics into painting the whole window with h264. Using
google-chrome --show-paint-rects
helps show what's happening.What's also interesting, is that after you get it into this state, switching tabs to something less egregious (like Trac), and interacting even a little bit with the site will cause the whole window to paint with h264. Once in this state, I'm not entirely sure how to get out of it. I'll attach a screen shot of me clicking into this text field to show what I mean.
967 corner case.png
(824.1 KiB)Note that the inline GIF and the giant sparkly SWAG (so annoying, but relevant) are the only things on the page updating, yet the whole page is being painted as H264
967 corner case part 2.png
(464.2 KiB)notice that only the cursor is updating and the whole page is being painted as h264.
re comment:17:
[[br]]
Did it use a video region for the whole window?
[[br]]
Yes it was using a video region for the whole window.
I'll spend some time to see if I can repro that easily and get relevant logs.
Somehow this ticket slipped through the cracks yesterday while I was testing for something similar. Either way, I found a solid repro in Chrome.
Using a trunk Fedora 23 13131 client and server, started with:
xpra start :13 --bind-tcp=0.0.0.0:2200 --start-new-commands=yes --start-child=google-chrome (or just an xterm to launch Chrome)
and connected with:
XPRA_OPENGL_PAINT_BOX=1 xpra attach tcp:ip:port
- Launch Chrome and navigate to a site that has no moving anything.
- This trac ticket works pretty well, but something with collapsible fields like mobile Wikipedia or Reddit threads make it far easier to trigger - no scrolling needed
- Fullscreen Chrome, or find a way to make it at least as large as 1080p
- Scroll up and down enough to trigger h264 paints
- Stop on a field with which you can interact
- interact with said field
Upon doing so, interacting with tiny elements cause the whole window to be repainted with h264. In doing so, I notice that it tends to paint the whole window with h264, and then when it refreshes with a picture encoding, it seems to have missed a frame or so, so the window appears to jump slightly. You'll also get this behavior when it starts painting text fields with h264, causing the cursor to jump periodically, and text to appear and disappear. Needless to say, it makes typing difficult.
Of note:
If you try to repro on this page, interacting with the text field for comments after scrolling is an easy way to trigger.
I will attach the requested
-d compress
log.In the log:
- Enabled
-d compress
from control channel- Scrolled up and down to trigger h264
- Clicked in and out of the Trac comment field to trigger full-window h264 paints
- Disabled
-d compress
from control channel
967 d compress log.log
(23.2 KiB)requested -d compress log
Issue migrated from trac ticket # 967
component: core | priority: critical | resolution: worksforme
2015-08-26 03:27:52: afarr created the issue