arvidn / libtorrent

an efficient feature complete C++ bittorrent implementation
http://libtorrent.org
Other
5.23k stars 996 forks source link

Poor uTP performance #3542

Closed ngjermundshaug closed 3 years ago

ngjermundshaug commented 5 years ago

When I seed a single 5GB file from an AWS server in Australia using the newest uTorrent version - to a single peer in Norway I get 2-3 megabyte/sec speed - which is pretty good compared to other UDP based transfer acceleration protocols. Flags in uTorrent on the peer side in Norway shows that uTP is used. The latency is about 350ms.

When I seed the same single file using libTorrent 1.1.0.0 I end up with approx 150-500 kbyte/sec. Flags in uTorrent on the peer side in Norway shows that uTP is used here as well. I am running vanilla settings for libTorrent - the only settings specified are seedmode = true and automanaged=false.

So it seems to be a uTP (UDP) performance issue with libtorrent?

I also saw a similar issue here where you are in the loop @arvidn : https://github.com/Tribler/tribler/issues/2620

What do you think?

libtorrent version (or branch): 1.1.0.0 platform/architecture: Win - x64 - static MT compiler and compiler version: VS2017 - toolset v.141

arvidn commented 5 years ago

I heard someone suggest that libtorrent's uTP implementation, when talking to uTorrent, experience a lot of packet loss. I don't think the tribler link or latency is a problem (or at least not the main problem). If you read through that whole thread I'm pretty sure we resolved it.

Also, there have been quite a few important fixes since libtorrent-1.1.0, probably not directly related to this uTP issue, but I would still encourage you to update to the latest version.

Do you think you would be able to enable utp logging on the libtorrent side, and run a minute or so of this transfer?

enabling uTP logging is a build switch. If you're building with boost-build, specify utp-log=on, otherwise define TORRENT_UTP_LOG_ENABLE. You also have to call set_utp_stream_logging(true); on startup in your client as well. This will create a file utp.log in the current working directory.

arvidn commented 5 years ago

I tried to reproduce this on my local network, seeding from libtorrent on linux to uTorrent on Mac, over uTP. I don't see any packet loss (meaning the implementations don't seem to have any obvious trouble talking to each other).

I have a lot less latency over this link, just a millisecond or so round-trip. But the bottleneck I see is uTorrent's advertised receive window. The receive buffer on the uTorrent side seems to be set to about 60 kB. Now, it's possible uTorrent Mac is a lot older than the windows version (it says 1.8.7).

This is what I did to client_test.cpp:

diff --git a/examples/client_test.cpp b/examples/client_test.cpp
index 14d841551..37fcc6088 100644
--- a/examples/client_test.cpp
+++ b/examples/client_test.cpp
@@ -47,6 +47,7 @@ POSSIBILITY OF SUCH DAMAGE.
 #include <sys/stat.h>
 #endif

+#include "libtorrent/utp_stream.hpp"
 #include "libtorrent/torrent_info.hpp"
 #include "libtorrent/announce_entry.hpp"
 #include "libtorrent/entry.hpp"
@@ -995,6 +996,7 @@ bool is_resume_file(std::string const& s)

 int main(int argc, char* argv[])
 {
+       lt::set_utp_stream_logging(true);
 #ifndef _WIN32
        // sets the terminal to single-character mode
        // and resets when destructed
ngjermundshaug commented 5 years ago

I was able to get the 1.2.0-RC2 and compile with the flags. The problem now is that I get the error "unexpected end of file in bencoded string" when loading my existing torrents.

Update: Stupid mistake on my end as always - will test further this afternoon.

Seeker2 commented 5 years ago

I've said uTorrent TO qBitTorrent using uTP was where the packet loss occurred. Didn't seem to want to go over ~1.5 MB/sec even as a burst rate using a single seed->peer on a LAN or on same computer 127.0..0.1 local loopback. qBT to uTorrent using uTP seemed much faster, but still slower than using TCP or uTorrent to uTorrent using uTP. I tested uTorrent 2.2.1 as well as old to most recent qBT versions. And I very recently retested using uTorrent 3.5.4 and 3.5.5 with the same results.

qBT to/from Deluge using uTP also seems impacted in speed. This is on Windows XP Pro SP3 and Windows 7 Pro 64bit, running qBT on Linux may work differently.

I forgot to mention... I saw strange uTP packet size differences depending on whether going from/to qBT using uTorrrent. Almost like uTorrent's larger uTP/UDP packet size was sometimes being rejected by qBT and forcing resends about 5-25% of the time.

ngjermundshaug commented 5 years ago

Ok - I measured again today with 1.2.0-RC2 - win64.

I will be happy to set up a AWS instance in Australia if you want to use it for testing transfer performance when there is high latency.

arvidn commented 5 years ago

thanks for the log! It looks like the connection does not have a lot of buffering delay, which means the regular congestion controller won't trigger throttling back. For these scenarios uTP will still throttle the congestion window based on packet loss, just like TCP.

If you generate the graphs (with tools/parse_utp_log.py) you'll see the saw-tooth shape. The problem, as far as I can tell, is that each lost packet is causing a halving of cwnd, where what should happen is that each lost packet per window should halve cwnd.

arvidn commented 5 years ago

utp out00000163b17b0a40-uploading

arvidn commented 5 years ago

@ngjermundshaug would you mind testing this patch? It's against RC_1_1 but I think the important part should apply against master too.

https://github.com/arvidn/libtorrent/pull/3543

ngjermundshaug commented 5 years ago

Still seeing the saw tooth pattern I'm afraid.

image utp.log.zip

Downloading from libTorrent: image

Downloading from uTorrent: image

arvidn commented 5 years ago

it's not the sawtooth itself that't the problem, it's expected when there's no buffer delay. The problem is that the cwnd pulls back a lot more than cut in half when there's packet loss (or times 0.78 as I updated in that patch).

However, it looks like you've tweaked the uTP parameters, or at least the target delay. Would you mind sharing what you've set the utp_* settings to?

setting the target delay to 1s (from the default 100ms) could have a small impact as it makes the "delay factor" smaller, and the small delay that is detected has much less impact on throttling.

Another important setting is utp_gain_factor, which is specified as number of bytes per ACK (at 0 delay). The closer the measured delay gets to the target, the smaller the effective gain will be. It defaults to 3000 bytes, which is roughly twice as fast as TCP, which uses one segment, i.e. about 1450 bytes.

I suspect that by the time the first packet loss is detected, the cwnd has grown too large, and more packets will be lost soon (but apparently not soon enough, as it happens more than one window out, or my patch is wrong). So lowering the gain factor could be a useful test as well.

arvidn commented 5 years ago

One possible issue (which is not entirely related) is that uTorrent advertise a dynamic receive window size, and libtorrent ends slow-start when it hits the receive window. This causes slow-start to terminate early, even though the receive window grows later. There was also a problem with ssthres wasn't set when slow-start ended in this way.

ngjermundshaug commented 5 years ago

I had set utp_target_delay to 1000 - based on the doc which says "A high value will make uTP connections more aggressive and cause longer queues in the upload bottleneck" - hoping that this would improve speeds.

I removed this setting from my code now - and ran it once more:


s = make_unique<lt::session>(lt::fingerprint("LT", LIBTORRENT_VERSION_MAJOR, LIBTORRENT_VERSION_MINOR, 0, 0));

settings_pack sp;
sp.set_bool(settings_pack::allow_multiple_connections_per_ip, true);
sp.set_int(settings_pack::unchoke_slots_limit, 999999);
sp.set_int(settings_pack::active_limit, 9999999);
sp.set_int(settings_pack::active_seeds, 9999999);
sp.set_int(settings_pack::connections_limit, 10000);
sp.set_bool(settings_pack::enable_outgoing_utp, true);
sp.set_bool(settings_pack::enable_outgoing_tcp, true);
sp.set_str(settings_pack::listen_interfaces, string("0.0.0.0:") + to_string(Config::TorrentPort));
s->apply_settings(sp);

//Load torrent into engine!
lt::add_torrent_params apr;
apr.save_path = pathToTransferFolder;
apr.upload_limit = 999999999;
apr.download_limit = 999999999;
apr.max_connections = 9999;
apr.max_uploads = 999999999;
apr.flags |= lt::add_torrent_params::flag_super_seeding; //ON
apr.flags |= lt::add_torrent_params::flag_seed_mode; //ON
apr.flags &= ~lt::add_torrent_params::flag_auto_managed; //OFF
apr.ti = std::make_shared<lt::torrent_info>(pathToTorrentFile);
lt::torrent_handle newHandle =s->add_torrent(apr);
newHandle.set_upload_mode(true); //No effect?
newHandle.resume();

image

utp out000001f2a0966200-send-packet-size

Based on my experience from other UDP based transfer (acceleration) protocols I think it's very detrimental that the UDP package size is reduced all the way down to 10000 at times. It should be at 65000 (max) or at least close to it at all times - at least for this transfer (given latency / package drop scenario).

When I run our Filemail Desktop application on the exact same server in Sydney which I am testing libTorrent on - and transfer files to the same pc in Norway - I get a transfer speed of over 4 megabytes per sec. Notice that the application uses large UDP packages (close to max size). If I were to cap the size of these packages to 10000 - then I would get very slow speeds.

image

I know that these are two fundamentally different protocols - just offering some thoughts regd. the drop in UDP Package size that we see in the libTorrent charts.

arvidn commented 5 years ago

@Seeker2 The good news is that I'm pretty sure I figured out what was wrong with downloading from uTorrent. The deferred ACK logic in libtorrent appears to have deferred the ACK a bit too much, making uT have a hard time growing its congestion window.

I committed the fix to the pull request I posted earlier: https://github.com/arvidn/libtorrent/pull/3543 (specifically, this patch)

@ngjermundshaug I doubt that using jumbo packets would make a significant difference right now. It seems pretty clear that the main problem is the packet loss. The problem in general to use large UDP packets, and to rely on fragmentation, is that any packet loss will become extremely expensive, as dropping a single fragment will toss the whole UDP packet.

So, I think the question is: why is the burst of packet loss spread out over multiple windows? How can congestion be detected earlier, before we run off the cliff and trigger multiple packets to be lost.

What does your transport protocol do when a packet is lost? How much packet loss do you experience?

Would you mind running another test with with the latest version of https://github.com/arvidn/libtorrent/pull/3543 applied? I made some improvements to the logging as well.

ngjermundshaug commented 5 years ago

I checked out your utp-packetloss branch and rebuilt with it.

I am now having difficulty getting it to seed. After I let it sit for a while, it starts - then it disconnects again.

uTorrent now shows flags K P. Have not seen K before (K means Peer is unchoking your client, but your client is not interested)

See log: utp.log (Too little data to parse and run in gnuplot)

arvidn commented 5 years ago

your log suggests that the clients connect and neither is interested in the other, they idle for a while and then disconnect.

I think the K flag is fine, libtorrent has an optimization where it unchokes peers pre-emptively when it has enough free upload slots, this saves a round-trip for peers that want to request data.

But it that flag also supports the utp.log in that uTorrent is not interested in the seed, perhaps it's already a seed itself?

ngjermundshaug commented 5 years ago

I had to remove these settings in order to make it actually check and seed the torrent with 1.1.11.0

//apr.flags |= lt::add_torrent_params::flag_super_seeding; //ON //apr.flags |= lt::add_torrent_params::flag_seed_mode; //ON //apr.flags &= ~lt::add_torrent_params::flag_auto_managed; //OFF //newHandle.set_upload_mode(true);

The utp.log file isn't parseable it seems. utp.zip

utorrent is still consideraby faster: image

arvidn commented 5 years ago

what happened when you set those torrent flags? did libtorrent think you did not have any pieces? setting upload_mode is a bit strange, as that normally means to not download anything. If you're a seed that's redundant.

arvidn commented 5 years ago

utp out00000162f74486d0-uploading

arvidn commented 5 years ago

one thing that may help is to lower utp_gain_factor to 1500, or something like that.

arvidn commented 5 years ago

I updated my patch with a feature in libutp where cwnd is not reduced more often than once every 100ms. (here).

The timings of most of the lost packets in your log suggests that it won't make a huge difference, but it may make some difference.

ngjermundshaug commented 5 years ago

I pulled and rebuilt now. uTorrent it still a lot faster.

image

When setting utp_gain_factor to 1500 speeds seems to have improved a bit - but it still gets cut back off pretty hard. image

Utp.log for last run: https://fil.email/I0QOGNYO (too large for githib)

When setting those flags mentioned earlier - it did not start to seed. The status was stopped.

the8472 commented 5 years ago

@arvidn

So, I think the question is: why is the burst of packet loss spread out over multiple windows? How can congestion be detected earlier, before we run off the cliff and trigger multiple packets to be lost.

ECN might help, but it would require support from both sides to signal support and reduce the receive window when a congestion notification is received.

@ngjermundshaug

If you control the machines that cause packet drops (i.e. it's the endpoints or an edge router) you might also want to configure CAKE on that device. Support is built-in on openwrt or linux kernels >= 4.19. It is easy to configure and comes with BLUE-derived AQM which should signal congestion sooner than fifo based queues with tail drop.

arvidn commented 5 years ago

@ngjermundshaug

When setting those flags mentioned earlier - it did not start to seed. The status was stopped.

Please check the resume data, perhaps the logic of how loading resume data is applied is different between 1.1.x and 1.2.0 is slightly different, where you pick up the setting from resume data that cancels your explicit flags.

I added this graph, just to make sure there's no issue where there's excessive sending of data at packet loss or time-outs. There doesn't seem to be.

utp out0000022ab3e58d40-cumulative_bytes_sent

I also noticed that the second large regression of the cwnd was actually a timeout, meaning that all 400 packets in-flight were wiped out because we didn't hear back in 500ms from the other side. Looking at what libutp (utorrent) does, I updated the patch to mimic it. It has the minimum timeout set to 1 second (libtorrent uses 500ms) and in addition, the timeout is set to mean_rtt + mean_rtt_deviation * 4 (whereas libtorrent multiplies the deviation by 2).

If you have a chance, please try to tweaked logic!

ngjermundshaug commented 5 years ago

Hi guys

@the8472 The issue is with libTorrent - and not with the machines/infrastructure etc. The utp implementation of uTorrent consistently performs well on the same setup.

@arvidn I recompiled and retested now. I have not set utp_gain_factor, utp_target_delay or any other special settings. uTorrent is still a lot faster.

image

image

image

utp.zip

PS - Your latest commit does not compile - timeout (line 3584) cannot be const.

the8472 commented 5 years ago

@ngjermundshaug libtorrent's utp implementation still might behave better under different circumstances. Since arvid wondered about detecting loss earlier I think smarter AQM that signals congestion sooner might be such a change of circumstances. Hence the suggestion.

arvidn commented 5 years ago

I’ve been looking at libutp for differences in logic that can explain these differences. I’m starting to wonder if libtorrent sends a packet that is considered invalid by utorrent, and is detected as loss because it dropped on the floor

Seeker2 commented 5 years ago

uTorrent's Speed Graph also has a Networking Overhead sub-graph that it can be switched to that can indirectly show lots of details about what it's doing. Retransmits can spike and at that moment speed drops immensely. This is where I was seeing the probable packet loss, as why would uTorrent retransmit its upload to a qBitTorrent client (via uTP) if there WASN'T packet loss?! These retransmits happen over internet, LAN, and even in 127.0.0.1 local loopback tests using ramdrives on both ends. Also done with/without speed limits so the cpu cores shouldn't overload. uTorrent seems to have lower cpu utilization.

Sadly, its uTP Delay graph (which might give other clues) is worthless+broken in latest versions I tested (uT v3.5.4-v3.5.5).

arvidn commented 5 years ago

Interesting. I tested over LAN to uTorrent MAC, and didn’t see any packet loss. But I think the Mac build is much older than the windows build

arvidn commented 5 years ago

It seems the main performance issue in this scenario is how libtorrent reacts to a selective ack. Afaict, libtorrent just resending a single packet per sack is a problem. The “duplicate ack” logic is also not very accurate. I’ve fixed it in a PR that I’m testing.

ngjermundshaug commented 5 years ago

Looking forward to give it a test spin. Happy New year 🎉🎉

arvidn commented 5 years ago

https://github.com/arvidn/libtorrent/pull/3553

that's the main change. I have a separate one for the deferred ACK, which I believe fixes a performance issue in the opposite direction.

https://github.com/arvidn/libtorrent/pull/3551

ngjermundshaug commented 5 years ago

I pulled and rebuilt now from RC_1_1.

Unfortunaltely, the performance difference is more or less the same. It also takes a long time (or some pause/resume'ing) for libTorrent to switch from TCP to UTP.

image

utp log.zip

Seeker2 commented 5 years ago

"takes a long time (or some pause/resume'ing) for libTorrent to switch from TCP to UTP."

qBitTorrent at least can force itself to uTP only. In uTorrent the same can be done by setting bt.transp_disposition to 26. (in advanced settings) This will override the BitTorrent, Enabled Bandwidth Management (uTP) checkbox.

One of the reasons why testing uTP takes so long is... uTorrent is NOT handshaking with qBitTorrent on the 1st uTP try. I confirmed this in uTorrent's logger, Process Monitor, and TCP view. Process Monitor even showed qBitTorrent received the 20 byte UDP packet successfully, it just didn't handshake/reply back.

I tested qBitTorrent versions 3.3.16, v3.4.0beta2, and v4.1.5 -- these use different versions of libtorrent v1.0.11 to v1.1.11 qBT v3.3.16 (using libtorrent 1.0.11) responded quickest (successful handshake within seconds of the 1st, on the 2nd try by uTorrent) suggests a possible regression after v1.0.x series of libtorrent...or yet another major bug in qBitTorrent that's not libtorrent-related. On the uTorrent side, I tested uTorrent 2.2.1 and 3.5.5 -- these could connect/handshake to each other using uTP on the first try.

arvidn commented 5 years ago

@Seeker2 did you happen to capture the uTP SYN packet sent to qBT where it failed to respond? (if so, could you share?)

arvidn commented 5 years ago

I'm testing RC_1_1 against uTorrent 3.5.5 and I don't see any failures by uT to connect to me over uTP.

Seeker2 commented 5 years ago

No I did not. Process Monitor only reports packet sizes not details.

qBitTorrent 4.1.5 DOES connect to uTorrent 3.5.5 using uTP -- it just has to fail at least once and that can delay the connection's start for >15 seconds. uTorrent 3.5.5 to another uTorrent 3.5.5 (or 2.2.1) connects almost instantly using uTP. And note I was having uTorrent 3.5.5 doing the outgoing connection to qBitTorrent -- so incoming from qBT's point-of-view.

arvidn commented 5 years ago

@ngjermundshaug would you mind giving this patch a try? https://github.com/arvidn/libtorrent/pull/3568 I'm not really happy with it as a final solution, but it seems to help a lot in my own testing.

ngjermundshaug commented 5 years ago

I am unable to get uTP to work with the latest patch. It always uses TCP.

I have also tried setting bt.transp_disposition to 26 and restarting uTorrent.

utp.log

ngjermundshaug commented 5 years ago

It started transferring using uTP after 15 mins or so.

image

utp.zip

arvidn commented 5 years ago

@ngjermundshaug I notice that libtorrent TCP and libtorrent uTP seem similar in your test. I wouldn't expect to transfer raster than TCP (on this kind of link). Now I'm curious to see libtorrent TCP compared to uTorrent TCP

ngjermundshaug commented 5 years ago

Yup - TCP transfer rates are pretty similar:

image

Given the large geographical distance between Norway and Australia - and the high latency (approx 350ms) - uTP (UDP) is much faster when working correctly.

arvidn commented 5 years ago

I'm not entirely convinced of that. One possibility is that uTorrent uTP (under these conditions, where there is virtually no buffering delay) may be a lot more aggressive than TCP. This isn't necessarily good.

Are you suggesting that the TCP configuration on your machines use a too small send and receive buffer? I believe uTorrent uses tops out at 1MB of send and receive buffers for uTP. (We found this sufficient to saturate a normal LAN).

If it isn't too much trouble, perhaps @ssiloti could provide a uTorrent build with utp logging enabled. If we ask nicely.

ngjermundshaug commented 5 years ago

File Transfer via UDP is always faster than TCP in high latency situations due to waiting for packages to be ack'ed (given that the UDP transfer protocol is well written of course). "All" transfer acceleration technologies aimed at high latency situations use UDP - since UDP packages are not acked (Acking, encryption, resending etc. is built on top of UDP).

I have not touched TCP config of the machines.

Some more speedtests. All tests are performed on the same hardware, one at a time (essentially sending files from Australia to Norway).

image

image

image

In general I recommend testing uTP in real life medium/high latency environments - this is a completely different ballgame compared to testing on LAN. I would be more than happy to provide you with a VM in Australia etc.

arvidn commented 5 years ago

File Transfer via UDP is always faster than TCP in high latency situations due to waiting for packages to be ack'ed

This is rather simplified. Given that you want a reliable transfer, packets always have to be ACKed. So clearly the ACKing is not the property that slows things down. It seems quite obvious that the bottlenecks over links with high bandwidth-delay-product is that send- and receive buffers may be too small. Given sufficiently large buffers, the latency is not a problem.

Unless you start having problem with rates not ramping up fast enough, since that's still ACK clocked. It's quite risky to ramp up quicker than the RTT though, as you will always overshoot the capacity and cause congestion and push out other well-behaving streams. Even slow-start doesn't ramp up quicker than the RTT.

"All" transfer acceleration technologies aimed at high latency situations use UDP - since UDP packages are not acked (Acking, encryption, resending etc. is built on top of UDP).

But all (general purpose) transfer protocols are ACKed. TCP sits on top of the IP protocol, which is also not ACKed, and is even more lean than UDP.

It used to be popular to circumvent small TCP buffers on high-bandwidth high-latency links by using multiple TCP connections, each of which got allocated one send and receive buffer. As far as I can tell though, there's no reason to believe TCP cannot saturate a link given large enough send and receive buffers.

(One problem is that the saw-teeth pattern of TCP still requires intermediate routers to have sufficient buffer space to hold half of a saw tooth, which isn't the case when using multiple TCP connections, as they get smoothed out).

In general I recommend testing uTP in real life medium/high latency environments - this is a completely different ballgame compared to testing on LAN.

It's really the bandwidth-delay-product that's the important factor here. On a gigabit LAN you'll run into the same issues as a high-latency internet link. However, I don't have a windows machine on my LAN so I can't test (a recent) uTorrent locally. The old mac version has too small send and receive buffers to let me test anything.

I've been testing over my home internet connection, I peak at about 11 MB/s and about 50ms RTT. I experience packet loss and as far as I can tell it's legitimate. The only way I can make it go faster is to ignore more of the packet loss and push packets harder. That's not really responsible behaviour.

I would be more than happy to provide you with a VM in Australia etc.

Sure, I'd be happy to test against a machine in Australia. Either a VM or a uTorrent instance that download a large test torrent that I produce, that I can seed to.

ngjermundshaug commented 5 years ago

The reason that TCP is slow on high latency links is that each TCP package needs to be acked before sending the next one (a bit simplified). A well written UDP transfer protocol allows for a lot more data to be in flight - and acked at a later point. UDP Package number 2 is sent before the package number 1 is acked. For these UDP based transfer protocols - it does not matter much if the latency is 50 or 200 ms.

For Filemail Desktop - we look at the rate at whick ack's are received. If we receive 100 ack's per sec - when we can safely move forward and send 100 * 1.1 datapackages per sec. This way - speed increases (gradually) until packet loss / bandwidth limitations kicks in - like it would on a LAN. Packages that are not acked within a certain amount of time are resent. The UDP traffic does not flood the network - and it does not take all bandwidth in practice. For low latency transfers we use TCP (HTTPS) - as this is faster due to less overhead.

The quick and dirty way to speed up TCP transfers when there is high latency is to have multiple connections - meaning that there will be more data in flight - like you say. However, you'll need a lot of parallel TCP connections in order for having as much data in flight as when using a UDP - and as we all know we do not have an infinite amount of TCP connections available.

On a gigabit LAN you'll run into the same issues as a high-latency internet link.

This is not correct. On a high latency internet link - it will take several hundreds of millisecs to ack each and every TCP package. On a LAN - this typically takes 1ms. And it's exactly this latency that causes TCP to be slow for these kinds of transfers (despite having great bandwidth).

I will set up a Windows VM in Australia and PM you the details in a few mins!

arvidn commented 5 years ago

The reason that TCP is slow on high latency links is that each TCP package needs to be acked before sending the next one (a bit simplified).

Well, it's simplified to the point where it isn't true. If you assume the extreme case where the send and receive buffer can hold a single message, why not take the other extreme where the send and receive buffer can hold the entire file being transferred? In that case flow-control is no longer limiting the rate, but only congestion control, and as long as you don't have congestion, it's not limiting you either.

A well written UDP transfer protocol allows for a lot more data to be in flight - and acked at a later point. UDP Package number 2 is sent before the package number 1 is acked. For these UDP based transfer protocols - it does not matter much if the latency is 50 or 200 ms.

Just like if you have a large send buffer (and large cwnd of course). The only things limiting how many bytes a TCP stream can have in flight (i.e. bytes sent that have not been ACKed) is the send buffer, receive buffer (advertised receive window) and the congestion window. The first two are primarily controlled by configurations, and the congestion window is only limited by packet loss.

For Filemail Desktop - we look at the rate at whick ack's are received. If we receive 100 ack's per sec - when we can safely move forward and send 100 * 1.1 datapackages per sec. This way - speed increases (gradually) until packet loss / bandwidth limitations kicks in - like it would on a LAN. Packages that are not acked within a certain amount of time are resent. The UDP traffic does not flood the network - and it does not take all bandwidth in practice.

When you experience packet loss, what do you do with the send rate? do you cut it in half?

If you don't, you are by definition pushing harder than TCP, and TCP streams will yield to you. Perhaps there is a sufficient number of TCP streams for their combined pushing to not be completely starved, but surely your transport protocol must take more than its fair share. Have you simulated it together with a TCP stream?

The quick and dirty way to speed up TCP transfers when there is high latency is to have multiple connections - meaning that there will be more data in flight - like you say. However, you'll need a lot of parallel TCP connections in order for having as much data in flight as when using a UDP - and as we all know we do not have an infinite amount of TCP connections available.

Or you can increase the send and receive buffers to allow for more data in flight. These are socket options, not particularly exotic. I believe there are also kernel options for default values and perhaps upper limits on what user space can set.

On a gigabit LAN you'll run into the same issues as a high-latency internet link.

This is not correct. On a high latency internet link - it will take several hundreds of millisecs to ack each and every TCP package. On a LAN - this typically takes 1ms. And it's exactly this latency that causes TCP to be slow for these kinds of transfers (despite having great bandwidth).

Can you explain why the latency causes TCP to slow down? (if it isn't because it has a limit on the number of bytes in-flight, i.e. send- and receive buffer).

If you need 10 MB of payload in-flight in order to saturate your link, but your send buffer is only 1MB you won't be able to saturate it. Whether the 10 MB of bandwidth-delay-product is caused by link with 33 MB/s and 300 ms RTT or by a link with 2 GB/s and 5ms RTT doesn't really matter from the point of view of saturating the link. The problem will be the same.

ngjermundshaug commented 5 years ago

Well, it's simplified to the point where it isn't true. If you assume the extreme case where the send and receive buffer can hold a single message, why not take the other extreme where the send and receive buffer can hold the entire file being transferred? In that case flow-control is no longer limiting the rate, but only congestion control, and as long as you don't have congestion, it's not limiting you either.

That's correct - but the real world isn't like this though. Files are broken up and transferred in small packets - and this is where TCP + high latency isn't a good match.

When you experience packet loss, what do you do with the send rate? do you cut it in half?

When we experience packet loss - then the rate at which ACK's received will drop accordingly - meaning that we will throttle down automatically. E.g. if we suddenly experience 50% packet loss - then we will cut the send rate in half. We don't think in terms of send/receive windows - we focus on the rate at at which acks arrive. I believe that this way of acking is fundamentally different than adhering to a window (which might be changed dynamically). We have no max limit of data in flight - and we do not care about ordering of packets. There is no head-of-line blocking.

Our fileservers are receiving all this UDP traffic side by side of HTTPS traffic - and we rarely/never see an issue when UDP hogging all the traffic. Most of our traffic is TCP based since only high latency transfers are transferred using UDP. And since we have fileservers around the world - most people hit a server which is close to them - with latency < 50 ms.

If you do a Google Search for UDP Transfer acceleration - and you'll see that there is an entire industry focusing on transferring files using UDP in order to maximize speeds across large geographical distances and high latency connections. E.g. Signiant, FileCatalyst, Catapult. The Google QUIC protocol is also UDP based - whenever you visit Youtube using Chrome - the content is served via UDP - since this provides lower latency and higher bandwidth compared to TCP.

Can you explain why the latency causes TCP to slow down? (if it isn't because it has a limit on the number of bytes in-flight, i.e. send- and receive buffer).

I believe it's the combination of the high latency + head-of-line blocking which causes a single TCP connection to slow down so much when there is high latency (despite having a large/dynamic receive window).

image

https://ma.ttias.be/googles-quic-protocol-moving-web-tcp-udp/

arvidn commented 5 years ago

When we experience packet loss - then the rate at which ACK's received will drop accordingly - meaning that we will throttle down automatically. E.g. if we suddenly experience 50% packet loss - then we will cut the send rate in half. We don't think in terms of send/receive windows - we focus on the rate at at which acks arrive. I believe that this way of acking is fundamentally different than adhering to a window (which might be changed dynamically).

Do you agree that this is a significantly more aggressive congestion controller than TCP?

For instance, I would expect it to introduce a significant late-comer disadvantage. i.e. if you have a fully "ramped-up" stream with your congestion controller, and then you open another connection, sharing the same network bottleneck. The second connection may not achieve 50% of the bandwidth.

We have no max limit of data in flight

But the network does. If a new stream is opened over the same network bottleneck, the limit of in-flight data should be reduced for your connection, otherwise you're using more than your fair share. My impression is that fairness is considered an important property of the TCP congestion controller.

  • and we do not care about ordering of packets. There is no head-of-line blocking.

This is only a problem in TCP if your send- and receive buffers aren't big enough. Every TCP implementation worth its name supports Selective ACK, which means you don't have to ACK packets in-order. uTP also supports selective ACK btw.

The specific issue in SPDY where they multiplex multiple streams over a single TCP connection isn't relevant for bulk transfers. libtorrent nor uTorrent care about the order of packets. Because we have sufficiently large send- and receive buffers where that's not a limiting factor.

head-of-line blocking refers to how payload that has been received by the kernel, sitting in the receive buffer, will not be passed on to the application out-of-order. So the kernel will delay passing on data until it has received the lost packet. This is a problem if the data doesn't rely on the lost packet, but is useful independently. For instance in real-time video games, or if the packets belong to separate streams. The reason why this would cause the TCP stream to slow down is if the kernel's receive buffer is too small, and it has to shrink or perhaps even close its advertised receive window to the sending side. i.e. signal back-pressure, to avoid over running its buffer.

You have the exact same effect on a low latency link, with a large bandwidth-delay-product. It's just that the bandwidth is so much higher that you overrun the receive buffer in less time.

Again, this is not a problem in utorrent or libtorrent, because our send and receive buffers are large enough.

the8472 commented 5 years ago

Have you tried a nc (netcat) transfer, with appropriate send and receive buffer sizes, between both hosts to establish a naive TCP baseline without any concerns about IO bottlenecks or application delays?

arvidn commented 5 years ago

In my test seeding from libtorrent in Stockholm to uTorrent on your AWS node in Australia, it seems uTorrent's receive buffer of 1MB is not large enough. I'm not sure why this is different from your test.

screenshots

utp out0x7f54e0041940-slow-start