Closed synctext closed 5 years ago
Big goal :
Benefit :
Current state of the art :
Some definitions :
What is credit :
Specific Task (no specific order) :
Problem/Limitation/Doubt :
Related work :
This is my current progress https://github.com/ardhipoetra/tribler/commits/credit_mining
Concrete ToDo:
Next step: torrent_checker.py integration.
Initial GUI on Tribler home screen:
Credit mining screen: 1000+ discovered channels, overall credit mining dashboard, channel status, channel content status.
Thesis storyline:
Next sprint: accurate investment information, torrent_checker.py integration, discovered swarms, downloaded torrents, DHT lookup failure, failed to download .torrent, used bandwidth for checking, swarms connected to, dead swarms, seeder/leecher info determined.
Next steps: understand swarm seed/leecher checking. Running code to bypass central tracker. Also problem to TCP check trackers, as tunnels only support UDP.
Goal: create a good scrape mechanism using DHT and UDP in existing messy Tribler code.
Some very old code is still using manual checking in our own Python code: https://github.com/Tribler/tribler/blob/v6.0/Tribler/Core/DecentralizedTracking/mainlineDHTChecker.py
Now we use Libtorrent, which has it's own DHT and another DHT library: pymdht.
The more recent versions of libtorrent have exposed more stuff to the python bindings. It would be nice if someone checked if the needed DHT stuff is available now. That would allow us to drop pymdht entirely.
Completed:
Tasks for coming 3 weeks:
Thesis material credit
Leftover work:
additional thesis material: cost of swarm checking in bytes yield of swarm investments coinvestor factor relay cost archiving cost eith disk space limits
working code, see issues above. Create a legal channel with 500+ legal swarms. Repeat the Mihai experiments. Show in Gumby on Jenkins with nice R plots.
Elric has 300 swarm and we have legal swarms from archive.org https://blog.archive.org/2012/08/07/over-1000000-torrents-of-downloadable-books-music-and-movies/ https://archive.org/details/moviesandfilms
Another legal source: http://terasaur.org/browse/category/video
Focus for 2 weeks: finish Problem Description and Related work (in 2 chapters or 1):
ToDo:
Thesis latex repository : https://github.com/ardhipoetra/msc-thesis-creditmining
Create:
Open PR #2398, #2351
October steps: experiments with prospecting: select channel, DHT responsiveness, time to download download first 4 pieces, discovered swarm size (PEX,seed/leech stats). design & add features to core API: ? prospect_swarm(SHA1) ? Plot for 10000 swarms: DHT success, 4 piece time, swarm size stats.
Problem description revision :
2.5 : how to prospect a good investment how to detect investment, what is the properties of that
2.6 : when to delete a downloaded swarm and replace it with a new investment if projected yield is more than enough - cost of download
Thesis feedback :
ToDo: pull requests ! Prospecting experiment
It has been processed in https://github.com/ardhipoetra/msc-thesis-creditmining/commit/04ce285dbc0aa69bb6702e9c0d827431dff5a2e1.
For prospecting details (how it works), I want to put that on 'design' chapter.
Downloading 4 pieces of 10k+ torrents in 12 hours with 500 torrent connection in one time, timeout is 1 hour, resulting in below :
Gumby repository, with current experiments : https://github.com/ardhipoetra/gumby/tree/credit_mining_exp/experiments/credit_mining
Idea for final experiment (after discussing prospecting) :
Next sprint:
With progress: create thesis committee
idea for 1st experiment of thesis.
This first experiment is designed to evaluate the basic operation, correctness, responsiveness, and efficiency of our work. It is specifically crafted to be simple and easy to understand plus debug.
1 channel, add torrent, add 1 credit miner. Add second torrent, measure how fast it is mined..
@ardhipoetra @pimveldhuisen @Captain-Coder Funny related work, duplicate of the whole Tribler idea "Steem: An incentivized, blockchain-based social media platform". However, it's is implemented as a simple Reddit clone, where your karma equals website-ownership and can be converted in cash.
Related to credit mining, bittorrent and utorrent launched "altruistic mode" which is very similar in credit mining. The concept is exactly the same with libtorrent share mode that we use.
Latest direct thesis .pdf download link: https://github.com/ardhipoetra/msc-thesis-creditmining/blob/master/thesis.pdf
Open PR in Gumby https://github.com/Tribler/gumby/pull/282 Tribler7 status: ToDo GUI, stability?
Current thesis version, 15Dec'16, with a 13-page introduction section. Explaining, freeriding, credits, private communities, and oversupply. thesis (43).pdf
ToDo: first paragraph of chapter 2, please add "autonomous operation". The goal of this work is to create a European Youtube type of system, without any server or central authority. Similar to Youtube in usability and ease-of-use, Bittorrent for robust streaming, but without the explicit file management and explicit seeding settings. The goal of this research is to devise an automatic caching layer which ensures content availability for both long-lived years-old and freshly created content mere seconds ago. Current 2.1 and 2.2 have too much overlap with Section 1. Please identify and delete all overlap. Move all related work and citations to a dedicated section. All stuff like "Leon et al. uses BitTorrent protocol to increase user download speed and at the same time reduce datacenters load." Section 2.4.1 needs to move to either 1 or 3, too much engineering details for 2.
Section 2.1 Challenges in creating credit mechanisms Explain that all prior stuff never works at scale. Lot of ideas, none proven to work. Section 2.2 add to 2.5 Storage capacity is limited and therefore we require a mechanism to optimize it's usage.. etc.
Add to section 2 the problem of "Co-Investors", when everybody sees the same swarms and does not see others are already boosting them. Thus the need for diversity and awareness of other credit miners.
Please rename: Section 3 Credit Mining System Design 3.2 Credit Mining Architecture, Section 4 Prospecting Algorithm New Section "Credit mining implementation and setup" 4.3 Credit mining experiment, please rename into experimental setup 6.1 "Live prospecting experiments"
The following experiment is specifically designed to be simple and validate all core components and algorithms of our credit mining research. All conditions are controlled and do not rely on external elements such as trackers or DHT nodes. Our first validation experiment tests the basic credit mining swarm selection algorithm. We create a dozen swarms, all a different number of seeders. A single credit miner then starts prospecting these swarms and should select the most underseeded swarm, yielding the best return. Our next validation experiment repeats the dozen swarm experiment with multiple credit miners. We show that all prospecting opportunities are found. This proves our algorithm is both effective and efficient.
With the following experiment we determine the costs, trade-offs and performance around the observation period of a swarm and the amount fo discovered peers.
Section 2.4.1 needs to move to either 1 or 3, too much engineering details for 2.
Moved to design chapter
Add to section 2 the problem of "Co-Investors", when everybody sees the same swarms and does not see others are already boosting them. Thus the need for diversity and awareness of other credit miners.
Added to future works as it is not possible right now by using current libtorrent.
Section 2.4.1 needs to move to either 1 or 3, too much engineering details for 2. Section 2.1 Challenges in creating credit mechanisms Explain that all prior stuff never works at scale. Lot of ideas, none proven to work. Section 2.2 add to 2.5 Storage capacity is limited and therefore we require a mechanism to optimize it's usage.. etc. Please rename: Section 3 Credit Mining System Design 3.2 Credit Mining Architecture, Section 4 Prospecting Algorithm New Section "Credit mining implementation and setup" 4.3 Credit mining experiment, please rename into experimental setup 6.1 "Live prospecting experiments"
The structure is now like this : 2. Problem Description (7 pages) 2.1 Challenges in creating credit mechanisms 2.1.1 Gain return and performance by spending credit 2.2 Prior Credit Mining Research 2.3 Prospecting good investment (Research Question 1) 2.3.1 Credit mining as investment tool 2.4 Substituting investment cache (Research Question 2) 3. Credit Mining System Design (11 pages) 3.1 Libtorrent 3.1.1 Priority Management 3.1.2 Share Mode 3.1.3 Peer Discovery 3.2 Credit Mining Architecture 3.2.1 Mining Sources 3.2.2 Resource Optimization 4. Prospecting Algorithm (7 pages) 4.1 Early prospecting stage 4.2 Live prospecting 4.2.1 Swarm Selection 4.2.2 Scoring Policy 4.2.3 Swarm incitation 5. Credit mining implementation and setup
Current 2.1 and 2.2 have too much overlap with Section 1. Please identify and delete all overlap. Move all related work and citations to a dedicated section. All stuff like "Leon et al. uses BitTorrent protocol to increase user download speed and at the same time reduce datacenters load."
I deleted some of the overlap, but the Leon one apparently not one of them.
I'm thinking to remove the secure currency section (+- 2 pages, chapter 1) along with multichain, LIRA, and TorCoin. The reason is, there is no implementation nor experiment regarding those secure currency alternatives.
good point. possible Subsection:
--- Currency versus secure karma points --- bitcoin has proven to be secure. explain the different goals and requirements for an incentive system and a currency.
83 pages: thesis (45).pdf "5.3.4 Experiment on user activity" focus on setup, this chapter does not yet discuss graphs. "5.4 Finding best parameters", more text for experimental section Please add details: 13000 torrent are downloaded for 1 hour each in batches of 500. This is a problem, try adding 1 per 7 seconds.
59%(8176) are OK when only downloading first 4 pieces, a mere 5%(668) for rarest first. Investigate what goes wrong with those 7508 swarms.
Explain everything at a deeper level, for instance:
"6.2 Comparison vs old", Performance compared to prior work
Experimental section:
Comments on this thesis version Please focus on readability and understandability of results section. Part-time: API in the core + swarm stats + credit earning stats. ToDo: compact the writing, remove double explanations. It's possible to cut 30 pages, without loss of depth! Clarify: 4.2.3 Swarm incitation
"Short torrent 26%(3690)" please rename/remove and clarify this concept. 3 strategies with comparable results (Rarest first, sequential and random). Discuss in either 1 section of 3, but 2 sections feels strange.
"6.2 Choosing the best swarm". Is that prospecting, swarm rotation or removal ? "Therefore, the seize up condition on the swarm occurred." please clarify.
ToDo: to test the whole credit mining engine for stability and functionality. Parse a bunch of RSS-based torrents and test the yield.
<item>
<title>My Linux Distro</title>
<link>
https://Helper-torrents.org/download.php/666000666/Ubuntu-Arch-Debian-16.10.torrent?torrent_pass=73d84e8686383954642a6f172ef
</link>
<pubDate>Wed, 25 Jan 2017 12:00:00 +0000</pubDate>
<description>696 MB; Software/ISO/PUBLIC</description>
</item>
Come up with names : early prospecting stage -> probing stage live prospecting stage -> mining stage
Swarm incitement -> ~instigate or~ stimulate?
Restructuring chapter 6, now : 6.1 Validating the credit mining system 6.1.1 What the policies choose 6.1.2 Obtained gain by the selection 6.2 Comparing obtained gain with prior work (compared with mihai's work) 6.3 Predownload hit experiment 6.4 Sustaining user experience on downloading 6.5 Swarm performance with credit mining 6.5.1 Varying the number of credit miners 6.5.2 The effect of swarm stimulation
Removed pie chart, and use stacked bar graph instead.
Changed a lot in thesis.pdf, reduced from 83 pages to 69 with more content. Commit id : https://github.com/ardhipoetra/msc-thesis-creditmining/commit/7f02c62374839403f3255b808352cf55e420b0b6
Come up with title :
Credits and blockchains in Bittorrent: designing prospecting and investments functions Comments:
The prospecting module can be divided into two main stage, named probing and mining stage.
Please do not invert the meaning of terms. Mining can not be part of prospecting. The former is about bringing in the gold, the latter finding gold.Algorithm 3: Finding rare pieces procedures
if this is faster then all known algorithms then keep it, otherwise remove.Algorithm 4 Peer translation algorithm
, remove and explain in max. 4 lines.The number of swarm for a single source should not be limited.
unclear.The selection need to be run periodically because swarm information is constantly
changing. At some point, previously selected swarm may not beneficial to mine for
both the miners and the community. Eventually, it has to be substituted by another
swarm. Swarm selection process is controlled by the applied policy. The policy
contains rules to sort which swarm to start and stop.
Although it is possible for user to create their own policy, three policies have
been defined on the preliminary work [26]. Those are based on random, swarm
age, and seeder ratio. Random policy mostly used as the baseline of the experiment.
Swarm Age policy is claimed better than random policy. It selects the swarm based
on its age. New published swarm known to have higher demand than other type
of swarm [10]. With a lot of peer in early swarm, it will increase the chance to
get more credit. Seeder Ratio policy selects the swarm that has lower number of
seeder relative to all the peers participated in this swarm. This policy is specialized
to help undersupplied swarm.
4.2.2 Scoring Policy
In this thesis, we intend to extend those policies so it can be highly customizable
and balance credit gaining and helping undersupplied swarm. We propose a policy
named scoring policy to tackle this issue. Scoring policy was brought up with
seeder ratio policy as its base. It offers a method to quantify swarm prospect and
to reduce possible identical result from two swarms or more. It can be customized
with its score multiplier. The idea behind this policy is to give higher score for
swarm which has lower seeder ratio, has more peers, lower piece availability, and
many activities detected in the swarm. Higher score means getting higher priority
to be mined.
Scoring policy consists of several features. First is the seeder ratio. As mentioned
before, it is useful to measure whether a swarm is undersupplied or not. This
is particularly useful to help a swarm. Lower seeder ratio should get higher priority
in mining. Second is the number of peer in a swarm. In small swarm, seeder ratio
may not relevant because there is not any significant difference between one and
another. In a case where ratio of seeder and peers is almost equal for all swarms,
it is useful to target large swarms instead of the small one as there are comparably
more options to give our pieces to. Third aspect is the file availability of the
swarm. In flashcrowd case swarm, there are a significant amount of peers who do
not have a large portion of the file. Comparing this to another swarm which most
of the leecher almost finished their download, mining flashcrowd swarm will give
higher credit and more beneficial to the community altogether. Last element that
we considered is the activity of a swarm. It is preferable to mine a swarm whose
some of the peers already download or upload any data from or to us. Moreover,
one which had faster speed is considered. The worst case happened when there
is not any peer interested with the piece we already have. In the other hand, if a
peer already interacted with us before, there is a chance we will get chosen again
in the future. This feature incentivize swarm which had interacted with the miners
previously.
Scalability experiment. 1000 swarms with each 2 seeders. 1% has 1 seeder. Measure how fast credit mining can find those swarms. Downloader is not necessary (not a yield experiment)
Response:
With millions of swarms, how can swarm blacklisting scale? (Section 3.2.3)
Swarm blacklisting just maintain the infohash index. In million swarms, it depends on the system's available memory. (It doesn't need to be very high though)
The prospecting module can be divided into two main stage, named probing and mining stage. Please do not invert the meaning of terms. Mining can not be part of prospecting. The former is about bringing in the gold, the latter finding gold. prospecting, probing, pre-downloading, mining. Readers will get confused, just use: prospecting and mining.
Prospecting module changed into Investing algorithm/module Probing (also predownloading) and Mining changed into prospecting and mining reason : it's weird to have 'prospecting stage' inside 'prospecting algorithm'. Also, as you said, 'mining' should not part of 'prospecting'
Algorithm 3: Finding rare pieces procedures if this is faster then all known algorithms then keep it, otherwise remove.
removed
Algorithm 4 Peer translation algorithm, remove and explain in max. 4 lines.
It is hard to explain without showing the algorithm because it has many if-else
conditions. I cut the explanation instead to be more concise instead.
For instance, please remove all duplicate explanations and compact the text below to 1/3 of current size.
changed into :
Three policies have been defined on the preliminary work [26]. Those are based
on Random policy, Swarm Age policy, and Seeder Ratio policy. Specifically for
Seeder Ratio policy, it is specialized to help undersupplied swarm and gave the
best gain credit out of three [26].
4.2.2
Scoring Policy
Scoring policy is a proposed method to quantify swarm prospect and to reduce possible iden-
tical result from two swarms or more. It was brought up with seeder ratio policy
as its base. It can be customized with its score multiplier.
Scoring policy consists of several features. First is the seeder ratio which defined
as the ratio of seeder and total number of peers. Second is the availability of a
swarm. It represents the piece shortage which shows the potential to gain more
credit in longer term. Third is the number of peers as a tie-breaker. It is useful to
target large swarms instead of the small one as there are comparably more options
to give our pieces to. Lastly, it is the swarm recent activity. Inspired by tit-for-tat,
the policy will prefer a swarm which any of its peers had interacted with the miners
previously.
Do not add jitter to measurement outcomes to prevent measurements point overlap.
Change the plot in 6.2 into density plot. Drawback, lost which swarms that selected at a particular time. Key points : credit mining selects underseeded swarm
A simple validation experiment..... Please design a most-minimal yield validation experiment. So 1 file size.
Result:
I stil need to run the scalability test, but for now, all your feedbacks have been processed. Any thought, @synctext? Updated thesis.pdf already in the repository.
Update on scalability experiment: I decided to put downloader to measure the prospecting accuracy as well. With 1000 swarms (each have 3 peers), the miners couldn't get some torrent metadata from the channel. The downloader also sometimes couldn't find the swarm that it wanted to download.
Reducing to 100 and everything okay. I'm trying the 500 swarm now.
The goal is still the same : how fast the credit miner can prospect, then find underseeded swarms.
UPDATE: 500 swarms gives me the same behavior. Moreover, even when the downloaders have finished their downloading, only ~250 out of 500 swarms discovered by the miner. Will focus on the 1% and 5% of 100 swarms instead.
Comments:
Chapter 3 explains term............. First page of Chapter 4 does not build on this knowledge, goes back to "limited and unverified information of the newly discovered swarm". Page 25 "trying to get as much information as possible of the swarm". Chapter 4: directly explain, for instance, the cardinal problem................
First two paragraph in that particular part :
Investing a swarm is one of the key elements of credit mining system. The investing
module can be divided into two main stage, named prospecting and mining stage.
After new swarm has been discovered, the credit mining system will estimate its
return-of-investment potential on the prospecting stage. Then in the mining stage,
the system will selectively join some of the high-prospected swarms to gain credits.
The combination of both stages builds the whole investing algorithm.
4.1 Prospecting stage
The cardinal problem of knowing swarm’s potential is to obtain reliable swarm size
and piece availability information. Those information are essential to predict future
demand and estimate its return of investment. Given a large number of swarms,
joining all of them is costly. Therefore, we introduce the prospecting algorithm: a
means to download few pieces to roughly determine peers and pieces information.
Prospecting cost per swarm is limited, allowing us to estimate and compare the
return-on-investment of a few thousand swarms easily.
Figure 5.3 unreadable. Just 5.3b without the directory popup, and "add boosting source" popup displayed in lower right corner.
Removed 5.3a, moved the popup to lower middle (to cover some of the channels' name)
Chapter 6: we now evaluate the performance.......
Adapted.
6.2 : channel is published...........
Added. Couldn't annotate the graph without overlapping annotation because they are too close to each other. I will put the explanation in the text instead. Also, its reasoning. Moreover, this experiment didn't use tracker and DHT.
6.3: explain goal.......
Key in prospecting is the ability to find a swarm that likely to give high investment
return. In the following section, we will asses how well our prospecting algorithm
can discard low-potential swarms on the Internet. Furthermore, we will also mea-
sure how fast and accurate the prospecting to find the high-potential swarms.
6.3: .... determine used bandwidth (exactly how many messages)
Unfortunately this is not possible, the logs I have was not recording the used bandwidth and number of sent messages. I've put this in the future works, though.
6.4 what is the goal in the first lines. "In the following experiments, the mining stage...........
In the following experiments, the mining stage as part of our investing algorithm is evaluated. We focus on the scoring policy that used in swarm selection. We wish to show that this policy selects the undersupplied swarms correctly. Furthermore, the mining performance from this selection will be evaluated. We will confirm that our policy will result a positive upload gain and ratio.
Figure 6.7 discrete number of peers...
Fixed.
Figure 6.7 plot discrete
change into:
enlarge Figure 6.4 and 6.5
done. For figure 6.5, I change it a bit to this :
Chapter 3, 1st and 2nd paragraph now rewritten as :
In this thesis, we introduce a “Credit mining system”, an automatic investment frame-
work that considers multiple properties of the swarms. With the credit mining
system, a user can both gain credit and improve swarm performance with limited
bandwidth allocation without any intervention necessary.
Figure 6.11. Make it human readable. Mining on/off
Move the legend outside the box to (vertical lines and boxes are explained. Slightly more complex but needed for clarity):
Mention experiment time in 6.5 Clarify the experiment in 6.6
Done
@synctext, I updated the polished version (appendix included) in my repository : https://github.com/ardhipoetra/msc-thesis-creditmining/blob/master/thesis.pdf
Probably the final version before last cleanup.
OK Thesis work by @ardhipoetra now finished. New student @EinNarr assigned to this project.
Focus for thesis (draft) : Blockchain-based credits for video streaming using embedded devices. Using Kodi-to-Kodi networking, without any servers and TrustChain work. Create a Youtube-like experience without servers and using private tracker oversupply effect.
First task, create unit tests for all components of existing credit mining code. Obtain decent code coverage and learn the code+architecture inside out. Step for next months to improve the naive algorithms... Note that relaying is more expensive than seeding. How to balance relaying with seeding? "optimal resource usage scheduling"
current status:
Please do not start over from scratch with credit mining. This thesis builds upon: Kodi code, Ardi code, community-size code.
Alternative storyline beyond credit mining: https://repository.tudelft.nl/islandora/object/uuid%3Aa1b443c7-8b37-4263-ae96-d38bc8b8f397 "Decentralized Autonomous Entity", [ref]
An example of how to create Tribler sessions in the tests: https://github.com/Tribler/tribler/blob/devel/Tribler/Test/Community/Market/Integration/test_market_base.py#L168
Another example of multiple Tribler session communicating with each other (with libtorrent enabled): https://github.com/Tribler/tribler/blob/devel/Tribler/Test/Community/Tunnel/FullSession/test_tunnel_base.py
Please focus on Nosetest credit mining work. As simple as possible: create 1 random swarm, seed, create channel with that single swarm, start credit mining, now 2 seeders, happy!
Thesis storyline; run within Android Kodi; embedded credit mining.
Thesis outline options:
@devos50 Hi, could you please kindly give me some peer view on the unit test I wrote? https://github.com/EinNarr/tribler/blob/creditmining_slimming/Tribler/Test/Core/CreditMining/test_boostingmanager.py
Next steps are finishing unit tests and rebase. Then conduct the key thesis experiment described above and generate experimental results
chapter content.
Mildly optimistic progress. The 1 seeder 1 swarm experiment could be operational in 1 week. Then the 10-ish swarm version. Finally the cheapo-device test and thesis graph plotting. Plus open Kodi binding issues.
Worked enabling one laptop with multiple instances, multiple state-directories. Fastest route to 1 seeder 1 swarm experiment? Use multiple Bittorrent client instances, not multiple Triblers. Either fix multiple instances or put channel inside database (or create own channel), start mining.
Include technical depth in credit mining thesis! Related thesis work $256 million bubble, Filecoin research roadmap 2017 also #3097 and Ycombinator comments Test with 10 new servers?
Get a working performance analysis environment. Goal: ability to make pretty graphs locally with both credit mining and multichain (two different branches). Task: make an Ubuntu channel or other large validated legal collection, with actual "from the wild" downloads. This environment will be expanded over coming months.
part of #23. ToDo: read "tragedy of the commons" paper, 1968 problem.
Strategy: Till end of Feb focus on credit mining, then include relaying bandwidth to avoid too much complexity at the start. Next sprint: co-investor detection?