Open synctext opened 7 years ago
Current operational measurement scripts: https://github.com/vandenheuvel/libtorrent-latency-benchmark
Measurements of the libtorrent download speed under different latencies with 6 seeders.
This is relevant for achieving onion routing with defence against traffic confirmation attacks.
ToDo for next meet, from Libtorrent API docs + tuning:
session_settings high_performance_seed();
outstanding_request_limit_reached
send_buffer_watermark_too_low
Investigate max_out_request_queue
, and here.
Background http://blog.libtorrent.org/2015/07/slow-start/ and http://blog.libtorrent.org/2011/11/requesting-pieces/ Easy fix: high_performance_seed returns settings optimized for a seed box, serving many peers and that doesn't do any downloading. It has a 128 MB disk cache and has a limit of 400 files in its file pool. It support fast upload rates by allowing large send buffers.
Additional boost: asynchronous disk I/O
Detailed solution : I’m unable to get more than 20Mbps with a single peer on a 140ms RTT link (simulated delay with no packet loss).. Original post
things you could adjust according to Arvid Norberg lead engineer of the libtorrent project.
“Did you increase the socket buffer sizes on both ends?”
int recv_socket_buffer_size;
int send_socket_buffer_size;
“There’s also buffer sizes at the bittorrent level:”
int send_buffer_low_watermark;
int send_buffer_watermark;
int send_buffer_watermark_factor;
“And there are buffers at the disk layer:”
int max_queued_disk_bytes;
int max_queued_disk_bytes_low_watermark;
New test using LXC's. Ten seeding LXC's, one downloading LXC. Single measurement. While high latencies seem to start slower, high latencies seem to do substantially better once transfer speed has stabilized. Latencies up to 150 ms perform, at maximum speed, similar to the base test without any latency. Measurement without any latency is very similar to earlier test using VMs.
A single seeding LXC. Single measurement. Higher latencies impact throughput heavily.
ToDo, try to obtain more resilience against latency in Libtorrent with single seeder, single leecher. Plus read current research on traffic correlation attacks. The basics are covered here. Quote: 'recent stuff is downright scary, like Steven Murdoch's PET 2007 paper about achieving high confidence in a correlation attack despite seeing only 1 in 2000 packets on each side'.
Strange observation that it takes 60 to 100 seconds for speed to pick up. is the seeder side the bottleneck, due to anti-freeriding stuff? Please repeat multiple times and create boxplots.
1 iteration to find the magic bottleneck...
Good news: It appears that the magic bottleneck is identified. Plot of single seeder, single leecher, 200ms latency. No reordering.
Throughput is mostly 15MB/s. Now with doubling the default
and max
parameters of net.ipv4.tcp_rmem
and net.ipv4.tcp_wmem
(measured in bytes):
We notice that throughput doubles also, to roughly 30MB/s. The bad news, however, is that further increasing these parameters has little effect: most of throughput speeds never pass the 35MB/s.
Also, the inconsistency in these measurements is still unexplained.
good! next bottleneck...
The magic parameter setting are discovered now, resulting in 35MByte/s Libtorrent throughput. Next steps are to advance this throughput in Tribler and Tribler+tor-like circuits.
Please document your LXC containers. ToDo: Tribler 1 seeder, 1 leecher; see influence of blocking SQLite writes on performance...
@qstokkink understands the details of tunnel community...
You will probably want a repeatable experiment using the Gumby framework. You are in luck, I created a fix for the tunnel tests just a few weeks ago: https://github.com/qstokkink/gumby/tree/fix_hiddenseeding . You can use that branch to create your own experiment, extending the hiddenservices_client.py
HiddenServicesClient with your own experiment and your own scenario file (1 seeder, 1 leecher -> see this example for inspiration).
Once you have the experiment running (which is a good first step before you start modifying things - you will probably run into missing packages/libraries etc.), you can edit the TunnelCommunity class in the Tribler project.
If you want to add delay to:
You can combine relaying nodes and the sending node into one by adding delay here (which does not include the exit node)
1 Tribler seed + 1 Tribler leecher with normal Bittorrent with 200ms - 750ms latency is limited by congestion control. Read the details here. To push Tribler towards 35MB/s with added Tor-like relays we probably at some point this year need to see the internal state of the congestion control loop.
ToDo for 2017: measure congestion window (cwnd) statistics during hidden seeding.
We managed to do some new tests. We ran Tribler with the only the http API and libtorrent enabled. We found that the performance of libtorrent within Tribler is significantly worse than plain libtorrent. Below is a summary of results so far. Note that these values for a single seeder single leecher test.
libtorrent | Tribler | |
---|---|---|
no latency | ~160 MB/s | 30 - 100 MB/s |
200 ms | ~15 MB/s | ~2.5 MB/s |
200 ms + magic | ~35 MB/s | ~2.5 MB/s |
Note that "magic" is the increasing of net.ipv4.tcp_rmem
and net.ipv4.tcp_wmem
parameters. It appears that Tribler suffers from a different bottleneck. Note that when testing without latency, speed varies heavily between ~30 MB/s and ~100 MB/s. During the all tests, cpu load was ~25% on 3 cores.
@synctext @devos50 @qstokkink does anyone have any ideas what might cause this?
@vandenheuvel Assuming all libtorrent versions are the same etc.: the only way Tribler interfaces with a libtorrent download is by retrieving its stats every second (handle.stats()
) and by handling alerts.
After writing that I found this in the libtorrent manual:
Note
these calls are potentially expensive and won't scale well with lots of torrents. If you're concerned about performance, consider using post_torrent_updates() instead.
Even though this shouldn't be that bad, you could try and write a loop which gets the torrent handle status every second on your naked experiment and see how that affects things
@vandenheuvel in addition, we are also processing all libtorrent alerts every second but I don't think this leads to much overhead actually. Could you try to disable the alert processing (by commenting out this line: https://github.com/Tribler/tribler/blob/devel/Tribler/Core/Libtorrent/LibtorrentMgr.py#L72)?
Very impressive work guys:
libtorrent | Tribler | |
---|---|---|
no latency | ~160 MB/s | 30 - 100 MB/s |
200 ms | ~15 MB/s | ~2.5 MB/s |
200 ms + magic | ~35 MB/s | ~2.5 MB/s |
Tribler shamefully collapses. Clearly something to dive into! Did the tip of fully disabling the stats+perhaps 5 second stats sampling lead to any results? btw, can you also expand this Tribler pain table
with 10, 25, and 75ms latency data points?
Our test results show a download speed of ~2.5 MB/s at 200 ms with our plain script as well, when we introduce the EPoll reactor into the code. This is similar to the results found in the previous tests with Tribler. However, tests with our plain script and Select reactor shows the original results that we have retrieved before introducing the reactor or even higher: a top speed of 30 MB/s. The next thing on our list is testing Tribler through twisted with the Select reactor.
Interesting. Could you also try the normal Poll reactor (it is supposed to be faster than Select for large socket counts)?
Strange enough, results are quite different between our script and Tribler. Summary:
EPollReactor |
SelectReactor |
PollReactor |
|
---|---|---|---|
script | 2.5 MB/s | 32 MB/s | 16 MB/s |
Tribler |
2.5 MB/s | 2.5 MB/s | 2.5 MB/s |
It may take a little while for the download to come up to speed (~ 60 seconds), but after that the throughput is quite steady.
Our next step will be profiling.
32MByte / sec. So.. Python3 and 200ms latency results. This facinating mistery deepens. Please make a profile print with human readable threadname printouts
We just ran our script under both python2.7
and python3.5
, this made no difference for the SelectReactor
.
Due to an update for lxc
our test results have changed drastically. The newest test results, using a latency of 200 ms except otherwise mentioned are:
No Reactor without delay |
No Reactor |
|
---|---|---|
Script | ~400 MB/s | ~32 MB/s |
All the below results are created by a modified script as well (200 ms):
EPollReactor |
SelectReactor |
PollReactor |
|
---|---|---|---|
Inactive | ~32 MB/s | ~16 MB/s | ~16 MB/s |
Semi-Active | ~32 MB/s | ~16 MB/s | ~16 MB/s |
Notes:
EPollReactor
and PollReactor
only reach their speed after a certain period of time.hmmm. so the lower non-script table is all Tribler?
In the above post, all results are our own script. We retested everything non-Tribler. We're not sure what this change of results (especially the No Reactor without delay
peek performance and EPollReactor
results) tells us about the quality of current testing method... These changes are enormous.
EPollReactor |
SelectReactor |
PollReactor |
|
---|---|---|---|
Tribler 0 ms |
~100 MB/s | ~130 MB/s | ~80 MB/s |
Tribler 200 ms |
2.5 MB/s | 2.5 MB/s | 2.5 MB/s |
PollReactor
without latency was varying wildly, the other measurements were steady. Sadly these results agree with previous results, before the LXC update. We will now try to bring our script and Tribler closer together by using the reactor thread to start libtorrent in our script.
impressive exploration. Good steps to the final goal: Tribler becomes latency tolerant, add big buffer, and remix anon tunnel packets.
please read
Two Cents for Strong Anonymity: The Anonymous Post-office Protocol "AnonPoP offers strong anonymity against strong, globally eavesdropping adversaries, that may also control multiple AnonPoP’s servers,even allbut-one servers in a mix-cascade."
Todays conclusion: Tribler drops factor 40 in performance when latency is added between single seeder, single downloader (tested with 4GByte file and 200ms latency). Factor 25 drop with pure Libtorrent plus Twisted (no Tribler) when latency is added there.
Tribler magically can start libtorrent from Twisted, however this fails to be reproducible. Feedback from Twisted developer and contact with Arvid: http://twistedmatrix.com/trac/ticket/9044 http://stackoverflow.com/questions/42351473/using-libtorrent-session-with-twisted-stuck-on-checking-resume-data
ToDo: refactor the core.libtorrent package. Next possible step: performance regression suite (#1287, but not now please) Factor 25-40 stuff is essential
@captain-coder remarked that touching Python at all from non-Twisted thread will cause trouble. Possibly the thing you see.
@synctext @Captain-Coder What exactly do you mean by that?
@synctext butchered what I said to him.
He described the problem: as starting an OS thread in process, having that thread talk to libtorrent, and hosting python in the same process to do "stuff". Where I remarked that as soon as the python bindings for libtorrent are loaded into this process (which happens pretty quickly if you import/use/touch Tribler in the hosted python instance) it might induce a situation where python structures are modified, through callbacks/pointers from libtorrent, initiated from the native OS thread. This violates the properties that the python GIL is trying to impose and has a good chance of tripping up the python interpreter (or even worse, silently corrupting it).
But it's also possible synctext described the problem wrong.
@vandenheuvel @MaxVanDeursen I just fixed the problem in your twisted branch. Now the leecher/seeder is no longer stuck in CHECKING_RESUME_DATA. The fix is:
def printSpeed(h):
for alert in ses.pop_alerts():
status = h.status()
print 'seeder', status.state, status.progress, alert
You can set the alerts using session.set_alert_mask
. I guess you should be using alerts until the problem is fixed in libtorrent/twisted.
@egbertbouman Wow... Thanks a lot for that! How did you come to this solution? And do you happen to know why this works?
@MaxVanDeursen You're welcome! Since libtorrent worked in Tribler, I just looked for the differences between the Tribler implementation and yours. I have no idea why this works.
Thanks to @egbertbouman we have some new results.
EPollReactor |
SelectReactor |
PollReactor |
|
---|---|---|---|
Script, starting session with Twisted 0 ms |
~400 MB/s | ~400 MB/s | ~400 MB/s |
Script, starting session with Twisted 200 ms |
32 MB/s | 16 MB/s | 16 MB/s |
We conclude that it doesn't matter which thread starts the session. We will start rewriting the Core.Libtorrent
package now: this way we can learn more about how Tribler interacts with libtorrent and hopefully discover why Tribler is so slow.
We did some profiling and found that under both the EPollReactor
and the SelectReactor
, close to 100% of the time was spent blocking in the poll
of the reactors. This seemed normal as Tribler was basically idle during these tests (except for downloading using Libtorrent at ~2.5 MB/s).
Yesterday, I spent some time testing Tribler vs libtorrent. I used Ubuntu 16.10 in virtualbox. For libtorrent only I got:
Then, I changed the code in the master branch to work with Tribler and got this:
Finally. I removed the flags keyword parameter from ltsession = lt.session(lt.fingerprint(*fingerprint), flags=1)
, and like magic:
The difference seems to be that Tribler is not loading some of the default libtorrent features (like UPnP, NAT-PMP, LSD). Very strange...
@egbertbouman Thanks for these hopeful results! Unfortunately, we have not been able to reproduce your results on our own system. However, we have noticed that using this flag in our basic script, the throughput of the program is negatively impacted to 2.5MB/s as well. We will look further into why the removal of this flag does not result in a change of throughput for us, although you have shown that this can be done.
so uTP is enabled, but lt.fingerprint(*fingerprint), flags=1 what does that do?
@synctext From the Libtorrent Manual:
If the fingerprint in the first overload is omited, the client will get a default fingerprint stating the version of libtorrent. The fingerprint is a short string that will be used in the peer-id to identify the client and the client's version ... The flags paramater can be used to start default features (upnp & nat-pmp) and default plugins (ut_metadata, ut_pex and smart_ban). The default is to start those things. If you do not want them to start, pass 0 as the flags parameter.
After the results of last week we decided to do exhaustive tests on all combinations of our script and Tribler, as well with and without flags.
Seeder has different script than downloader in table below. Diagonal is equal code. All have 200ms latency.
Leecher \ Seeder | Script NoFlag | Tribler NoFlag | Tribler Flag | Script Flag |
---|---|---|---|---|
Script NoFlag | H | L | H | L |
Tribler NoFlag | L | L | L | L |
Tribler Flag | L | L | L | L |
Script Flag | L | L | L | L |
L(ow): convergence of download speed at the rate of ~2.5MB/s H(igh): convergence of download speed at the rate way above 2.5MB/s (i.e. >10MB/s)
From these results there is nothing in plain sight that we can conclude as far as the usage of this flag goes.
Please consider at some point to open up the magic box and dive into these detailed networking statistics. Something is off...
Key problem: discover why Tribler has 2.5 MByte/sec performance with 200ms seeder-leecher latency, and clean Twisted-wrapped Libtorrent has 16MByte/sec
.
The methodology is to strip the Python wrapping code and use a methodology of stumbling upon success. Need to be realistic and systematically do a brute-force exploration why it doesn't work. Finding the magic fix directly is not realistic at this point. Stripped Libtorrent manager now 40 lines of code, still 2.5 MByte/sec. Dispersy is off in Tribler.
Is it the database that blocks the whole Twisted thread for 25ms regularly?
possible experiment to decide if the bottleneck is within the code or parameter settings or both:
Mix and match both ways to see where the fault lies...
I just read through (most) of this thread. Here are some observations:
the libtorrent python bindings never call back into python from within libtorrent. Early versions attempted to do this, by supporting python extensions. This turned to cause subtle memory corruption or GIL deadlocks. So, libtorrent is not supposed to interact with python other than through the direct API calls. (so, if you find anything like that, please report a bug!)
libtorrent has built-in instrumentation added specifically to troubleshoot performance issues. In versions before 1.1 it's a build switch (TORRENT_STATS
), in 1.1 and later the stats are reported as an alert (which is disabled by default). If these alerts are printed to a log, it can be analyzed by tools/parse_session_stats.py
in the libtorrent repo (that script exists pre 1.1 too, but loads the files produced by libtorrent. The gotcha is that the stats files are written to CWD, which needs to be writable by the process). The script requires gnuplot
and will render a number of graphs and hook them up to an html page. Looking at them may reveal something.
The meaning of the flags passed to the session
constructor are defined here. By default both flags are set (add_default_plugins
and start_default_features
). By passing in 1, you just add the plugins, without starting DHT, UPnP, NAT-PMP and local peer discovery.
I also have some questions:
Are you testing with TCP or uTP? The uTP implementation in libtorrent could be made to perform better on linux by using sendmmsg()
and recvmmsg()
, but currently it requires a system call per packet sent or received.
which version of libtorrent are you using? If you're on 1.1.x, one way to have more control over how the session is set up is to use the constructor that takes a settings_pack
. This lets you setup the configuration before starting the session.
Is the main issue you're looking at the performance difference between your twisted wrapper and Tribler? Or are you looking at understanding the bottlenecks making the 200ms case have so much lower throughput than the 0 ms case? If the latter, would it be useful to you if I would whip up a one-to-one test transfer simulation with various latencies?
@arvidn Thanks a lot for these points! You can have a look at our code here.
We're using the latest version in the Ubuntu repositories. Note that we're only trying to explain the difference in performance between Tribler and our script. Thus, calls like sendmmsg()
and recvmmsg()
don't seem that relevant as we're not doing these calls (explicitly) in our fast script.
Using settings_pack
's might be helpful, but right now we're trying to replicate Tribler's settings in our script to achieve the same performance. @MaxVanDeursen maybe we should just refactor Tribler's code to use settings_pack
's as a first libtorrent refactor step and hope that performance improves?
Yes, we're looking to explain this difference under the condition of 200 ms, as the difference becomes larger (from 4x to 8x using default settings) with respect to the 0 ms case. This latency (and above) is relevant for tunnel community
performance, as we suspect this is the current bottleneck.
For our script (at 200 ms), we already know what the bottleneck is: buffer sizes (see above). Right now, we believe it is most probable that this has to do with settings passed to libtorrent either in creating the libtorrent session or adding a torrent.
Report can be found at overleaf.
Problem: Tor-like tunnels introduce significant latency.
Measure how 25ms to 1000ms of latency affects the Libtorrent throughput. Aim to understand how LEDBAT parameters can be set to maximize throughput. Create an experimental environment using containers. Use netem to create latency between two containers. Start seeder at one container, download across the connecting bridge with added latency.
Initial results: performance is severly efected by latency of just 50 ms.