Open synctext opened 11 years ago
"Performance analysis of a Tor-like onion routing implementation", Quinten Stokkink, Harmjan Treep, http://arxiv.org/abs/1507.00245
@qstokkink student Wouter is aiming to repeat your work and do a CPU performance analysis automagically after each pull request.
Roadmap :-)
@Pathemeous Found interesting code from Quinten: https://github.com/Tribler/tribler/compare/devel...qstokkink:devel
If you want to re-use our code you should extract the start_profiling and stop_profiling functions from our code and call them respectively before and after whatever you want to profile (and make sure it's in the filter). The tunnel_piecharts.R script can then parse profiling files in this format. We integrated this into the Gumby pipeline with a script added to the .conf file.
Thanks @qstokkink, I will have a look.
We do already have memory profiling and manhole (telnet into the process to inspect it in real time) It would be cool to have the profiling integrated into gumby's instrumentation.py so all the experiments can use it straight away.
A simple 1 week starting experiment without Tribler, Tor-stack, and no Gumby.
Goal is to first detect if there are Libtorrent bottlenecks. Experimental setup consists of just Libttorrent on Ubuntu. Libtorrent with seeding of 50GB..250GByte and testing the local download performance. With 1000 swarms or so, it is expected that Libtorrent grinds to a halt in normal settings.
Outcome: Spend 2 weeks to create graph to show performance development, as you're seeding more GBytes and swarms.
Experiment:
Progress: first script to seed and control Libtorrent from Python. Step closer to a PullRequest tester in Jenkins with 1 TByte seeding test. Ardhi has 3000+ Linux .iso torrents, seems sufficient for a future test.
Measure in an easy to build setting the cost of 1 TByte seeding (or MaxHardDiskCapacity). Start Libtorrent for 1 hour with various seeding size settings and measure the total consumed bandwidth. Goal is to identify the overhead. For instance, set the seeding upload bandwidth to just 10 KByte or so. This need to be subtracted to obtain the DHT, PEX, and othe control overhead protocol traffic.
Docs: _The limits of the number of downloading and seeding torrents are controlled via active_downloads, active_seeds and active_limit in sessionsettings change the default of 5 active seed to 10000 :-)
Some of the .torrent
crawled can be found in my dropbox. Currently it has 3658 .torrents.
I'm crawling mininova now, but I got blocked so maybe it will take some time to get more torrents on this site. AFAIK mininova now hosts legal torrents only.
@ardhipoetra great, thanks for sharing!
You might want to rate-limit your requests and maybe use HTTP proxies for the crawling process?
In the end, I used rate-limit my request and it works.
As @devos50 requested, here is the link to the collection, zipped. You can put that in the bbq.
As for the crawler, I made the repository on https://github.com/ardhipoetra/legal-torrent-crawler
Currently, Tribler will create an introduction point for every torrent it is seeding. This could potentially overload the exit nodes, especially now that exit nodes are also running the PexCommunity
. To minimize this problem it was thinking about somehow limiting the number of introduction points that we create. We could do this from the TriblerTunnelCommunity
or maybe using libtorrent's auto-management feature (which allows us to limit the number of active seeds). Using auto-management seems to make the most sense. @qstokkink @devos50 What do you think?
Sure. Why not?
@drew2a Just a small reminder. Please setup a seedbox for a Tribler channel with lots of Creative Commons music. (e.g. different then superapp; overlap). Simple static dump. For demo purposes only. Show that we can seed lots of stuff: https://github.com/mdeff/fma
EDIT: then please use that to setup a demo channel with markdown and real content #3615
For further development. An idea.
As I see, there are two types of Tribler's users:
So, what if we developed a tool that makes it easier to create and seed a channel?
Like:
$./create_and_seed.sh <folder>
Where
my channel
├ sub_directory
| ├ file1
| ├ file2
| └ README.md
├ sub_directory2
| ├ file3
| └ file4
└ README.md
The behavior:
my channel
as a channel name*.md
-file in this folder.@ichorid what do you think?
@ichorid what do you think?
What are the fileX
things? Torrents? Or actual files that should become individual torrents?
What are the fileX things? Torrents? Or actual files that should become individual torrents?
Actual files (discussed offline).
I did an experiment:
1GB
of data divided into 1024
torrents (generate_test_data.py)3
different torrents (picked randomly) from another PC. All downloads have been completed within the range of [5..30] seconds.No trackers were used.
Libtorrent version: 1.2.10
FMA test data were seeded for one month (1 channel, 156 torrents, 23 GB total). The music data are still available inside Tribler.
Ability to seed 1 TByte of content using Tribler.
This requires announcing, say, 1000 swarms of content in the DHT. This is a problem, as shown here: http://blog.libtorrent.org/2012/01/seeding-a-million-torrents/ The announce interval needs to be prolonged in order to reduce DHT announce traffic. Perhaps the DHT cannot handle this and a new peer discovery method is required: #13.
As a quick partial fix we can put a cap on the maximum swarms to DHT announce. Then use a simple round-robin method to cycle slowly through all available swarms.