Tribler / tribler

Privacy enhanced BitTorrent client with P2P content discovery
https://www.tribler.org
GNU General Public License v3.0
4.85k stars 450 forks source link

Operational credit mining branch, Gumby scenario and multichain #1842

Closed synctext closed 5 years ago

synctext commented 8 years ago

Get a working performance analysis environment. Goal: ability to make pretty graphs locally with both credit mining and multichain (two different branches). Task: make an Ubuntu channel or other large validated legal collection, with actual "from the wild" downloads. This environment will be expanded over coming months.

part of #23. ToDo: read "tragedy of the commons" paper, 1968 problem.

Strategy: Till end of Feb focus on credit mining, then include relaying bandwidth to avoid too much complexity at the start. Next sprint: co-investor detection?

ardhipoetra commented 8 years ago

Thesis Proposal/Roadmap v.0.3

Big goal :

Benefit :

Current state of the art :

  1. [1] is not integrated on Tribler, paper only shows how many credit (bandwidth) gathered
  2. [1] only contains high level abstraction of archive mode
  3. [1] Unknown effect when applying credit mining in respect to download/upload normally as it used all the bandwidth without consideration
  4. [1] no defined source standard (or maybe it's just me). Have to handle some exceptional case
  5. Helping swarm heavily relied on "share mode" in libtorrent [2]
  6. Current study limited to private community [3]
  7. Multichain only consider bandwidth [4]

Some definitions :

What is credit :

Specific Task (no specific order) :

Problem/Limitation/Doubt :

  1. Persistence credit. Credit obtained from previous session can be used thereafter.
    • Is it necessary? Is it follows the "standard"?
    • If yes : how to store credit. If no : clean the storage, revert state.
    • Outside the scope of credit mining, probably in multichain's scope
  2. Multichain only hold bandwidth.
    • How to incorporate used storage and "effort rate" in peer transactions/communication?
    • partial seeding with complete storage need to incentivize as well
  3. Is it possible to :
    • mark that a piece/chunk is mined
    • check a swarm being helped or not (get helper list from a particular swarm)
    • when seeding, a node prioritize the one who helped him before (more than just t4t)
    • map user ID/IP to credit?

Related work :

  1. Decentralized credit mining in p2p system, Mihai(2015)
  2. Towards a peer-to-peer bandwidth marketplace, Mihai (2014)
  3. Investment Strategies for Credit-Based P2P Communities, Mihai(2013)
  4. MultiChain: A cybercurrency for cooperation, Steffan (2015) - Msc Thesis
ardhipoetra commented 8 years ago

This is my current progress https://github.com/ardhipoetra/tribler/commits/credit_mining

synctext commented 8 years ago

Concrete ToDo:

Next step: torrent_checker.py integration.

synctext commented 8 years ago

Initial GUI on Tribler home screen: 20160224_110104

Credit mining screen: 1000+ discovered channels, overall credit mining dashboard, channel status, channel content status. 20160224_112541

Thesis storyline:

Next sprint: accurate investment information, torrent_checker.py integration, discovered swarms, downloaded torrents, DHT lookup failure, failed to download .torrent, used bandwidth for checking, swarms connected to, dead swarms, seeder/leecher info determined.

synctext commented 8 years ago

Next steps: understand swarm seed/leecher checking. Running code to bypass central tracker. Also problem to TCP check trackers, as tunnels only support UDP.

Goal: create a good scrape mechanism using DHT and UDP in existing messy Tribler code.

Some very old code is still using manual checking in our own Python code: https://github.com/Tribler/tribler/blob/v6.0/Tribler/Core/DecentralizedTracking/mainlineDHTChecker.py

Now we use Libtorrent, which has it's own DHT and another DHT library: pymdht.

whirm commented 8 years ago

The more recent versions of libtorrent have exposed more stuff to the python bindings. It would be nice if someone checked if the needed DHT stuff is available now. That would allow us to drop pymdht entirely.

synctext commented 8 years ago

Completed:

Tasks for coming 3 weeks:

Thesis material credit

Leftover work:

synctext commented 8 years ago

additional thesis material: cost of swarm checking in bytes yield of swarm investments coinvestor factor relay cost archiving cost eith disk space limits

synctext commented 8 years ago

working code, see issues above. Create a legal channel with 500+ legal swarms. Repeat the Mihai experiments. Show in Gumby on Jenkins with nice R plots.

synctext commented 8 years ago

Elric has 300 swarm and we have legal swarms from archive.org https://blog.archive.org/2012/08/07/over-1000000-torrents-of-downloadable-books-music-and-movies/ https://archive.org/details/moviesandfilms

Another legal source: http://terasaur.org/browse/category/video

synctext commented 8 years ago

Focus for 2 weeks: finish Problem Description and Related work (in 2 chapters or 1):

ToDo:

ardhipoetra commented 8 years ago

Thesis latex repository : https://github.com/ardhipoetra/msc-thesis-creditmining

synctext commented 8 years ago

Create:

Open PR #2398, #2351

October steps: experiments with prospecting: select channel, DHT responsiveness, time to download download first 4 pieces, discovered swarm size (PEX,seed/leech stats). design & add features to core API: ? prospect_swarm(SHA1) ? Plot for 10000 swarms: DHT success, 4 piece time, swarm size stats.

ardhipoetra commented 8 years ago

Problem description revision :

2.5 : how to prospect a good investment how to detect investment, what is the properties of that

2.6 : when to delete a downloaded swarm and replace it with a new investment if projected yield is more than enough - cost of download

synctext commented 8 years ago

Thesis feedback :

ToDo: pull requests ! Prospecting experiment

ardhipoetra commented 8 years ago

It has been processed in https://github.com/ardhipoetra/msc-thesis-creditmining/commit/04ce285dbc0aa69bb6702e9c0d827431dff5a2e1.

For prospecting details (how it works), I want to put that on 'design' chapter.

ardhipoetra commented 8 years ago

Downloading 4 pieces of 10k+ torrents in 12 hours with 500 torrent connection in one time, timeout is 1 hour, resulting in below :

dist

histo

ardhipoetra commented 8 years ago

Gumby repository, with current experiments : https://github.com/ardhipoetra/gumby/tree/credit_mining_exp/experiments/credit_mining

synctext commented 8 years ago

Idea for final experiment (after discussing prospecting) :

Next sprint:

With progress: create thesis committee

synctext commented 8 years ago

idea for 1st experiment of thesis.

This first experiment is designed to evaluate the basic operation, correctness, responsiveness, and efficiency of our work. It is specifically crafted to be simple and easy to understand plus debug.

1 channel, add torrent, add 1 credit miner. Add second torrent, measure how fast it is mined..

synctext commented 7 years ago

@ardhipoetra @pimveldhuisen @Captain-Coder Funny related work, duplicate of the whole Tribler idea "Steem: An incentivized, blockchain-based social media platform". However, it's is implemented as a simple Reddit clone, where your karma equals website-ownership and can be converted in cash.

ardhipoetra commented 7 years ago

Related to credit mining, bittorrent and utorrent launched "altruistic mode" which is very similar in credit mining. The concept is exactly the same with libtorrent share mode that we use.

synctext commented 7 years ago

Latest direct thesis .pdf download link: https://github.com/ardhipoetra/msc-thesis-creditmining/blob/master/thesis.pdf

synctext commented 7 years ago

Open PR in Gumby https://github.com/Tribler/gumby/pull/282 Tribler7 status: ToDo GUI, stability?

Current thesis version, 15Dec'16, with a 13-page introduction section. Explaining, freeriding, credits, private communities, and oversupply. thesis (43).pdf

ToDo: first paragraph of chapter 2, please add "autonomous operation". The goal of this work is to create a European Youtube type of system, without any server or central authority. Similar to Youtube in usability and ease-of-use, Bittorrent for robust streaming, but without the explicit file management and explicit seeding settings. The goal of this research is to devise an automatic caching layer which ensures content availability for both long-lived years-old and freshly created content mere seconds ago. Current 2.1 and 2.2 have too much overlap with Section 1. Please identify and delete all overlap. Move all related work and citations to a dedicated section. All stuff like "Leon et al. uses BitTorrent protocol to increase user download speed and at the same time reduce datacenters load." Section 2.4.1 needs to move to either 1 or 3, too much engineering details for 2.

Section 2.1 Challenges in creating credit mechanisms Explain that all prior stuff never works at scale. Lot of ideas, none proven to work. Section 2.2 add to 2.5 Storage capacity is limited and therefore we require a mechanism to optimize it's usage.. etc.

Add to section 2 the problem of "Co-Investors", when everybody sees the same swarms and does not see others are already boosting them. Thus the need for diversity and awareness of other credit miners.

Please rename: Section 3 Credit Mining System Design 3.2 Credit Mining Architecture, Section 4 Prospecting Algorithm New Section "Credit mining implementation and setup" 4.3 Credit mining experiment, please rename into experimental setup 6.1 "Live prospecting experiments"

The following experiment is specifically designed to be simple and validate all core components and algorithms of our credit mining research. All conditions are controlled and do not rely on external elements such as trackers or DHT nodes. Our first validation experiment tests the basic credit mining swarm selection algorithm. We create a dozen swarms, all a different number of seeders. A single credit miner then starts prospecting these swarms and should select the most underseeded swarm, yielding the best return. Our next validation experiment repeats the dozen swarm experiment with multiple credit miners. We show that all prospecting opportunities are found. This proves our algorithm is both effective and efficient.

With the following experiment we determine the costs, trade-offs and performance around the observation period of a swarm and the amount fo discovered peers.

ardhipoetra commented 7 years ago

Section 2.4.1 needs to move to either 1 or 3, too much engineering details for 2.

Moved to design chapter

Add to section 2 the problem of "Co-Investors", when everybody sees the same swarms and does not see others are already boosting them. Thus the need for diversity and awareness of other credit miners.

Added to future works as it is not possible right now by using current libtorrent.

Section 2.4.1 needs to move to either 1 or 3, too much engineering details for 2. Section 2.1 Challenges in creating credit mechanisms Explain that all prior stuff never works at scale. Lot of ideas, none proven to work. Section 2.2 add to 2.5 Storage capacity is limited and therefore we require a mechanism to optimize it's usage.. etc. Please rename: Section 3 Credit Mining System Design 3.2 Credit Mining Architecture, Section 4 Prospecting Algorithm New Section "Credit mining implementation and setup" 4.3 Credit mining experiment, please rename into experimental setup 6.1 "Live prospecting experiments"

The structure is now like this : 2. Problem Description (7 pages) 2.1 Challenges in creating credit mechanisms 2.1.1 Gain return and performance by spending credit 2.2 Prior Credit Mining Research 2.3 Prospecting good investment (Research Question 1) 2.3.1 Credit mining as investment tool 2.4 Substituting investment cache (Research Question 2) 3. Credit Mining System Design (11 pages) 3.1 Libtorrent 3.1.1 Priority Management 3.1.2 Share Mode 3.1.3 Peer Discovery 3.2 Credit Mining Architecture 3.2.1 Mining Sources 3.2.2 Resource Optimization 4. Prospecting Algorithm (7 pages) 4.1 Early prospecting stage 4.2 Live prospecting 4.2.1 Swarm Selection 4.2.2 Scoring Policy 4.2.3 Swarm incitation 5. Credit mining implementation and setup

Current 2.1 and 2.2 have too much overlap with Section 1. Please identify and delete all overlap. Move all related work and citations to a dedicated section. All stuff like "Leon et al. uses BitTorrent protocol to increase user download speed and at the same time reduce datacenters load."

I deleted some of the overlap, but the Leon one apparently not one of them.

ardhipoetra commented 7 years ago

I'm thinking to remove the secure currency section (+- 2 pages, chapter 1) along with multichain, LIRA, and TorCoin. The reason is, there is no implementation nor experiment regarding those secure currency alternatives.

synctext commented 7 years ago

good point. possible Subsection:

--- Currency versus secure karma points --- bitcoin has proven to be secure. explain the different goals and requirements for an incentive system and a currency.

synctext commented 7 years ago

83 pages: thesis (45).pdf "5.3.4 Experiment on user activity" focus on setup, this chapter does not yet discuss graphs. "5.4 Finding best parameters", more text for experimental section Please add details: 13000 torrent are downloaded for 1 hour each in batches of 500. This is a problem, try adding 1 per 7 seconds.

59%(8176) are OK when only downloading first 4 pieces, a mere 5%(668) for rarest first. Investigate what goes wrong with those 7508 swarms.

Explain everything at a deeper level, for instance:

"6.2 Comparison vs old", Performance compared to prior work

Experimental section:

synctext commented 7 years ago

Comments on this thesis version Please focus on readability and understandability of results section. Part-time: API in the core + swarm stats + credit earning stats. ToDo: compact the writing, remove double explanations. It's possible to cut 30 pages, without loss of depth! Clarify: 4.2.3 Swarm incitation

"Short torrent 26%(3690)" please rename/remove and clarify this concept. 3 strategies with comparable results (Rarest first, sequential and random). Discuss in either 1 section of 3, but 2 sections feels strange.

"6.2 Choosing the best swarm". Is that prospecting, swarm rotation or removal ? "Therefore, the seize up condition on the swarm occurred." please clarify.

ToDo: to test the whole credit mining engine for stability and functionality. Parse a bunch of RSS-based torrents and test the yield.

<item>
<title>My Linux Distro</title>
<link>
https://Helper-torrents.org/download.php/666000666/Ubuntu-Arch-Debian-16.10.torrent?torrent_pass=73d84e8686383954642a6f172ef
</link>
<pubDate>Wed, 25 Jan 2017 12:00:00 +0000</pubDate>
<description>696 MB; Software/ISO/PUBLIC</description>
</item>
ardhipoetra commented 7 years ago

Come up with names : early prospecting stage -> probing stage live prospecting stage -> mining stage

Swarm incitement -> ~instigate or~ stimulate?

Restructuring chapter 6, now : 6.1 Validating the credit mining system 6.1.1 What the policies choose 6.1.2 Obtained gain by the selection 6.2 Comparing obtained gain with prior work (compared with mihai's work) 6.3 Predownload hit experiment 6.4 Sustaining user experience on downloading 6.5 Swarm performance with credit mining 6.5.1 Varying the number of credit miners 6.5.2 The effect of swarm stimulation

Removed pie chart, and use stacked bar graph instead.

ardhipoetra commented 7 years ago

Changed a lot in thesis.pdf, reduced from 83 pages to 69 with more content. Commit id : https://github.com/ardhipoetra/msc-thesis-creditmining/commit/7f02c62374839403f3255b808352cf55e420b0b6

Come up with title :

  1. Autonomous prospecting and investing mechanism in distributed system
  2. Leveraging currency in BitTorrent : Prospecting and Investment in Credit Mining
  3. Credit mining : an autonomous prospect and investment system
synctext commented 7 years ago

Credits and blockchains in Bittorrent: designing prospecting and investments functions Comments:

ardhipoetra commented 7 years ago

Scalability experiment. 1000 swarms with each 2 seeders. 1% has 1 seeder. Measure how fast credit mining can find those swarms. Downloader is not necessary (not a yield experiment)

Response:

With millions of swarms, how can swarm blacklisting scale? (Section 3.2.3)

Swarm blacklisting just maintain the infohash index. In million swarms, it depends on the system's available memory. (It doesn't need to be very high though)

The prospecting module can be divided into two main stage, named probing and mining stage. Please do not invert the meaning of terms. Mining can not be part of prospecting. The former is about bringing in the gold, the latter finding gold. prospecting, probing, pre-downloading, mining. Readers will get confused, just use: prospecting and mining.

Prospecting module changed into Investing algorithm/module Probing (also predownloading) and Mining changed into prospecting and mining reason : it's weird to have 'prospecting stage' inside 'prospecting algorithm'. Also, as you said, 'mining' should not part of 'prospecting'

Algorithm 3: Finding rare pieces procedures if this is faster then all known algorithms then keep it, otherwise remove.

removed

Algorithm 4 Peer translation algorithm, remove and explain in max. 4 lines.

It is hard to explain without showing the algorithm because it has many if-else conditions. I cut the explanation instead to be more concise instead.

For instance, please remove all duplicate explanations and compact the text below to 1/3 of current size.

changed into :

Three policies have been defined on the preliminary work [26]. Those are based
on Random policy, Swarm Age policy, and Seeder Ratio policy. Specifically for
Seeder Ratio policy, it is specialized to help undersupplied swarm and gave the
best gain credit out of three [26].
4.2.2
Scoring Policy
Scoring policy is a proposed method to quantify swarm prospect and to reduce possible iden-
tical result from two swarms or more. It was brought up with seeder ratio policy
as its base. It can be customized with its score multiplier.
Scoring policy consists of several features. First is the seeder ratio which defined
as the ratio of seeder and total number of peers. Second is the availability of a
swarm. It represents the piece shortage which shows the potential to gain more
credit in longer term. Third is the number of peers as a tie-breaker. It is useful to
target large swarms instead of the small one as there are comparably more options
to give our pieces to. Lastly, it is the swarm recent activity. Inspired by tit-for-tat,
the policy will prefer a swarm which any of its peers had interacted with the miners
previously.

Do not add jitter to measurement outcomes to prevent measurements point overlap.

Change the plot in 6.2 into density plot. Drawback, lost which swarms that selected at a particular time. Key points : credit mining selects underseeded swarm screenshot from 2017-02-17 21-41-04

A simple validation experiment..... Please design a most-minimal yield validation experiment. So 1 file size.

Result: screenshot from 2017-02-22 10-30-40

I stil need to run the scalability test, but for now, all your feedbacks have been processed. Any thought, @synctext? Updated thesis.pdf already in the repository.

ardhipoetra commented 7 years ago

Update on scalability experiment: I decided to put downloader to measure the prospecting accuracy as well. With 1000 swarms (each have 3 peers), the miners couldn't get some torrent metadata from the channel. The downloader also sometimes couldn't find the swarm that it wanted to download.

Reducing to 100 and everything okay. I'm trying the 500 swarm now.

The goal is still the same : how fast the credit miner can prospect, then find underseeded swarms.

UPDATE: 500 swarms gives me the same behavior. Moreover, even when the downloaders have finished their downloading, only ~250 out of 500 swarms discovered by the miner. Will focus on the 1% and 5% of 100 swarms instead.

synctext commented 7 years ago

Comments:

ardhipoetra commented 7 years ago

Chapter 3 explains term............. First page of Chapter 4 does not build on this knowledge, goes back to "limited and unverified information of the newly discovered swarm". Page 25 "trying to get as much information as possible of the swarm". Chapter 4: directly explain, for instance, the cardinal problem................

First two paragraph in that particular part :

Investing a swarm is one of the key elements of credit mining system. The investing
module can be divided into two main stage, named prospecting and mining stage.
After new swarm has been discovered, the credit mining system will estimate its
return-of-investment potential on the prospecting stage. Then in the mining stage,
the system will selectively join some of the high-prospected swarms to gain credits.
The combination of both stages builds the whole investing algorithm.

4.1 Prospecting stage

The cardinal problem of knowing swarm’s potential is to obtain reliable swarm size
and piece availability information. Those information are essential to predict future
demand and estimate its return of investment. Given a large number of swarms,
joining all of them is costly. Therefore, we introduce the prospecting algorithm: a
means to download few pieces to roughly determine peers and pieces information.
Prospecting cost per swarm is limited, allowing us to estimate and compare the
return-on-investment of a few thousand swarms easily.

Figure 5.3 unreadable. Just 5.3b without the directory popup, and "add boosting source" popup displayed in lower right corner.

Removed 5.3a, moved the popup to lower middle (to cover some of the channels' name)

Chapter 6: we now evaluate the performance.......

Adapted.

6.2 : channel is published...........

Added. Couldn't annotate the graph without overlapping annotation because they are too close to each other. I will put the explanation in the text instead. Also, its reasoning. Moreover, this experiment didn't use tracker and DHT.

6.3: explain goal.......

Key in prospecting is the ability to find a swarm that likely to give high investment
return. In the following section, we will asses how well our prospecting algorithm
can discard low-potential swarms on the Internet. Furthermore, we will also mea-
sure how fast and accurate the prospecting to find the high-potential swarms.

6.3: .... determine used bandwidth (exactly how many messages)

Unfortunately this is not possible, the logs I have was not recording the used bandwidth and number of sent messages. I've put this in the future works, though.

6.4 what is the goal in the first lines. "In the following experiments, the mining stage...........

In the following experiments, the mining stage as part of our investing algorithm is
evaluated. We focus on the scoring policy that used in swarm selection. We wish
to show that this policy selects the undersupplied swarms correctly. Furthermore,
the mining performance from this selection will be evaluated. We will confirm that
our policy will result a positive upload gain and ratio.

Figure 6.7 discrete number of peers...

Fixed.

synctext commented 7 years ago

thesis (55).pdf

ardhipoetra commented 7 years ago

Figure 6.7 plot discrete

change into: screenshot from 2017-03-13 15-06-26

enlarge Figure 6.4 and 6.5

done. For figure 6.5, I change it a bit to this :

screenshot from 2017-03-13 17-30-50

Chapter 3, 1st and 2nd paragraph now rewritten as :

In this thesis, we introduce a “Credit mining system”, an automatic investment frame-
work that considers multiple properties of the swarms. With the credit mining
system, a user can both gain credit and improve swarm performance with limited
bandwidth allocation without any intervention necessary.

Figure 6.11. Make it human readable. Mining on/off

Move the legend outside the box to (vertical lines and boxes are explained. Slightly more complex but needed for clarity): screenshot from 2017-03-14 18-41-37

Mention experiment time in 6.5 Clarify the experiment in 6.6

Done

ardhipoetra commented 7 years ago

@synctext, I updated the polished version (appendix included) in my repository : https://github.com/ardhipoetra/msc-thesis-creditmining/blob/master/thesis.pdf

Probably the final version before last cleanup.

synctext commented 7 years ago

OK Thesis work by @ardhipoetra now finished. New student @EinNarr assigned to this project.

Focus for thesis (draft) : Blockchain-based credits for video streaming using embedded devices. Using Kodi-to-Kodi networking, without any servers and TrustChain work. Create a Youtube-like experience without servers and using private tracker oversupply effect.

First task, create unit tests for all components of existing credit mining code. Obtain decent code coverage and learn the code+architecture inside out. Step for next months to improve the naive algorithms... Note that relaying is more expensive than seeding. How to balance relaying with seeding? "optimal resource usage scheduling"

synctext commented 7 years ago

current status:

synctext commented 7 years ago

Please do not start over from scratch with credit mining. This thesis builds upon: Kodi code, Ardi code, community-size code.

synctext commented 7 years ago

Alternative storyline beyond credit mining: https://repository.tudelft.nl/islandora/object/uuid%3Aa1b443c7-8b37-4263-ae96-d38bc8b8f397 "Decentralized Autonomous Entity", [ref]

devos50 commented 7 years ago

An example of how to create Tribler sessions in the tests: https://github.com/Tribler/tribler/blob/devel/Tribler/Test/Community/Market/Integration/test_market_base.py#L168

Another example of multiple Tribler session communicating with each other (with libtorrent enabled): https://github.com/Tribler/tribler/blob/devel/Tribler/Test/Community/Tunnel/FullSession/test_tunnel_base.py

synctext commented 7 years ago

Please focus on Nosetest credit mining work. As simple as possible: create 1 random swarm, seed, create channel with that single swarm, start credit mining, now 2 seeders, happy!

Thesis storyline; run within Android Kodi; embedded credit mining.

synctext commented 7 years ago

Thesis outline options:

EinNarr commented 7 years ago

@devos50 Hi, could you please kindly give me some peer view on the unit test I wrote? https://github.com/EinNarr/tribler/blob/creditmining_slimming/Tribler/Test/Core/CreditMining/test_boostingmanager.py

synctext commented 7 years ago

Next steps are finishing unit tests and rebase. Then conduct the key thesis experiment described above and generate experimental results chapter content.

synctext commented 7 years ago

Mildly optimistic progress. The 1 seeder 1 swarm experiment could be operational in 1 week. Then the 10-ish swarm version. Finally the cheapo-device test and thesis graph plotting. Plus open Kodi binding issues.

synctext commented 7 years ago

Worked enabling one laptop with multiple instances, multiple state-directories. Fastest route to 1 seeder 1 swarm experiment? Use multiple Bittorrent client instances, not multiple Triblers. Either fix multiple instances or put channel inside database (or create own channel), start mining.

Include technical depth in credit mining thesis! Related thesis work $256 million bubble, Filecoin research roadmap 2017 also #3097 and Ycombinator comments Test with 10 new servers?