Xpra-org / xpra

Persistent remote applications for X11; screen sharing for X11, MacOS and MSWindows.
https://xpra.org/
GNU General Public License v2.0
1.94k stars 169 forks source link

re-implement bandwidth constraint option #417

Closed totaam closed 6 years ago

totaam commented 11 years ago

So we can limit ourselves to N Mbps if desired.

This may be implemented two ways:

Or a combination of the 2.

totaam commented 10 years ago

It would be nice to make this generic enough so that we can pass down the information to each encoder, but taking into account the fact that we may have many windows, each of which consuming a variable amount of bandwidth is not going to be easy!

totaam commented 7 years ago

See also #540, #401, #619 and #999

totaam commented 6 years ago

Support added in r17232.

You can see the current settings and the bandwidth budget distribution between multiple windows using:

xpra info | egrep -i "bandwidth-limit"

This is what iftop shows for a 1Mbps target and glxgears using up all the available bandwidth:

localhost.localdomain => localhost.localdomain    941Kb   826Kb   984Kb

Caveats:

TODO:

totaam commented 6 years ago

Hooked the network interface speed (when available) and disabled mmap, see #540 comment 16.

totaam commented 6 years ago

r17255: adds the UI option to the HTML5 client's connect dialog, defaults to the value we get from the browser's network information API (as per #1581#comment:3) We don't do this when bypassing the connect dialog, at least for now.

totaam commented 6 years ago

@maxmylyn: ready for a first round of testing. So far, I have used "glxgears" for generating a high framerate, "iftop" to watch the bandwidth usage in realtime and the system tray to change the limit. I've also resized the glxgears window to generate more pixel updates - a larger window should give us a lower framerate (higher batch delay) and higher compression (lower speed and quality). To verify that we stick to our budget correctly, we should test using a strict bandwidth shaper (ie: tc) to replicate real-life network conditions. As long as the bandwidth-limit is slightly below the limit set by the shaper, the results should be identical. When capturing problematic conditions, make sure to get the full network characteristics (latency, bandwidth, etc) and xpra info output.#### 2017-10-23 17:16:55: antoine commented


Support added in r17232.

You can see the current settings and the bandwidth budget distribution between multiple windows using:

xpra info | egrep -i "bandwidth-limit"

This is what iftop shows for a 1Mbps target and glxgears using up all the available bandwidth:

localhost.localdomain => localhost.localdomain    941Kb   826Kb   984Kb

Caveats:

TODO:

totaam commented 6 years ago

2017-10-27 21:19:56: maxmylyn commented


Okay, initial testing is complete(trunk 2.X 17263 Fedora 25 server/client). It seems to work fine, at least the lower limits. I'm not sure my machine is capable of pushing huge amounts of data, so the 1/2/5 Mbps limits were all I could test.

One request - to facilitate testing, can we have a control channel or client/server side CLI flag or conf, such that I don't have to use the system tray (since GNOME has decided we don't need that). If we get a switch then I can add a very quick test run or two to the automated test box.

totaam commented 6 years ago

One request - to facilitate testing, can we have a control channel or client/server side CLI flag or conf, such that I don't have to use the system tray (since GNOME has decided we don't need that). If we get a switch then I can add a very quick test run or two to the automated test box.

totaam commented 6 years ago

2017-10-31 02:25:11: maxmylyn commented


Note to self:

  • Check with PNG/L
  • Double and triple check with TC bandwidth constraints
totaam commented 6 years ago

2017-11-01 22:09:20: maxmylyn commented


Alright, this was a fun one to test. For reference my server and client are both Fedora 25 running trunk 17281.

So I had to spend about half an hour sifting through random forum posts asking how to do this, and they all wanted some sort of weird multi-line tc command magic...so then I remembered we had some documentation on how to do delay and loss in #999. After perusing that I settled on a command:

tc qdisc add dev ens33 root netem rate 1mbit

Adapted from [https://serverfault.com/questions/787006/how-to-add-latency-and-bandwidth-limit-interface-using-tc] - close but not quite, and a bit complicated for our simple use-case. Anyways, I'm leaving this here for when I eventually will need to come back to this ticket.

NOTE: Be careful with that command, you can easily lose your SSH session if you're not careful.

And, I played around with 1mbps and 2mbps limits. I set the server to rate limit at 1mbps, and enabled and disabled TC both at 1mbps and 2mbps and in both cases, the bandwidth dropped for a second or so right after enabling/disabling TC (which makes sense as TC probably interrupts connections), but afterwards it settles around 1mbit +- a bit. The highest I saw it get was 1.2mbps with TC set to 2mbps and the limit set to 1mbps, but it settles pretty quickly down to 1mbps. So, I can definitively say the rate limiting is working as expected, even with network limits applied.

As for the png/L encoder - I'm not sure how to force that encoding. I tried --encodings=png/L which should force it to use that encoding, but when I do it fails to connect with:

2017-11-01 15:07:30,448 server failure: disconnected before the session could be established
2017-11-01 15:07:30,448 server requested disconnect: server error (error accepting new connection)
2017-11-01 15:07:30,468 Connection lost

I'm not entirely sure how to force the PNG/L encoding like we talked about, so I'm going to pass this to you to ask how.

totaam commented 6 years ago

... settles around 1mbit +- a bit .. Does the bandwidth-limit=1mbps work better than not setting any value when running on a 1mbps constrained connection? (in particular the perceived screen update latency, which should correlate with the batch.delay + damage.out_latency) Did you test with tc latency and jitter? Did you notice any screen update repetitive stuttering?

it fails to connect with: server failure...

The error shown in the server log was: Exception: client failed to specify any supported encodings, r17282 fixes that.

totaam commented 6 years ago

Minor cosmetic improvements in r17296 + r17297 + r17298.

totaam commented 6 years ago

r17452 adds bandwidth availability detection (server side), see #999#comment:18 for details.

totaam commented 6 years ago

2017-12-12 23:11:05: maxmylyn commented


Finally catching up to this one:

Does the bandwidth-limit=1mbps work better than not setting any value when running on a 1mbps constrained connection? (in particular the perceived screen update latency, which should correlate with the batch.delay + damage.out_latency)

Definitely. Just running glxgears without the added bandwidth limitations makes it apparent that the added bandwidth limitation helps immensely. Without setting a bandwidth limit, framerate is all over the place with lots of stutters and catching up. With the bandwidth limit set, the framerate is much smoother and notably more consistent with only a small initial stuttering.

Did you notice any screen update repetitive stuttering?

I already mentioned this above, and yes, but only when on a severely constrained connection without the limit set (--bandwidth-limit=).

Did you test with tc latency and jitter?

I'll do this shortly....right after my ~3pmish espresso.

totaam commented 6 years ago

2017-12-12 23:41:59: maxmylyn commented


Alright I ran a few levels of TC:

"Light TC" aka 50ms +-10ms with a 25% chance delay 50ms 10ms 25%:

  • Some stuttering - framerate not quite as high as only bandwidth limits, but still half decent

"Light TC only loss" aka 2% loss no jitter loss 2%:

  • Lots of stuttering - but higher framerate when not stuttering. Unfortunately it stutters a lot more than it is holding a steady framerate.

"Medium TC only loss" aka 2% loss some jitter loss 2% 25%:

  • Not much worse than the light TC with only loss - but framerate was notably lower even when it wasn't stuttering

Just to be thorough I threw a combination of loss and delay loss 2% delay 50ms 10ms but it wasn't pretty - very low framerate, with the occasional burst of a bit more framerate.


As a total aside - I wonder if there's some utility that will give some kind of packet type accounting in aggregate - to see how much of an impact of the TCP packets have with needing to resend data. Mostly out of curiosity.

totaam commented 6 years ago

The stuttering with packet loss is caused by the packets backing up whilst waiting for the server's TCP resend, then they're all flowing again at the same time. UDP (#639) could deal with this better, by skipping those frames. (not sure the current heuristics skip frames as often as they should) We could also use some sort of throttled vsync to keep the framerate more constant when recovering, but that would be hard work and nothing is going to allow us to avoid the pause with TCP as this is happening in the host network stack. I think this works well enough to close this ticket, we can open new ones with refinements / new requirements.

totaam commented 6 years ago

Not sure how well this got tested: although the original changeset (r17259) was fine, r17296 introduced a regression which causes the connection to abort when the system tray bandwidth limit is changed... Fixed in r18141.