synctext commented 8 years ago

Get a working performance analysis environment. Goal: ability to make pretty graphs locally with both credit mining and multichain (two different branches). Task: make an Ubuntu channel or other large validated legal collection, with actual "from the wild" downloads. This environment will be expanded over coming months.

part of #23. ToDo: read "tragedy of the commons" paper, 1968 problem.

Strategy: Till end of Feb focus on credit mining, then include relaying bandwidth to avoid too much complexity at the start. Next sprint: co-investor detection?

ardhipoetra commented 8 years ago

Thesis Proposal/Roadmap v.0.3

Big goal :

Mining credit anonymously

Benefit :

Boost unhealthy swarms, thus increasing the download speed for leechers
Collecting "credit" so "helper" can get higher download speed
Maximize bandwidth utilization of a user/node

Current state of the art :

[1] is not integrated on Tribler, paper only shows how many credit (bandwidth) gathered
[1] only contains high level abstraction of archive mode
[1] Unknown effect when applying credit mining in respect to download/upload normally as it used all the bandwidth without consideration
[1] no defined source standard (or maybe it's just me). Have to handle some exceptional case
Helping swarm heavily relied on "share mode" in libtorrent [2]
Current study limited to private community [3]
Multichain only consider bandwidth [4]

Some definitions :

Mining = Activity to collect credit anonymously
Helper = The miners
Credit = multichain block ("repackaged" bandwidth)

What is credit :

Credit in Tribler is defined by "Multichain" [4]
Multichain currently only contains total up/down data (bandwidth used)
Multichain stored locally on top of Dispersy database
Multichain defined transaction history of a particular node with others

Specific Task (no specific order) :

Integrate credit mining into Tribler so we can see its effect on Tribler-Tribler or Tribler-non Tribler
Mining must be aware of bandwidth used in current activity (Downloading/Seeding/Streaming), and act accordingly
Compare and improve (if possible) current policies to detect swarm in needs. Real life scale experiment
- Intuitively, VOD/streaming in Tribler needs more healthy swarm than other file type
- Ignore unhealthy swarm that very few leecher (for the greater goods)
- Advanced policy, instead of sorting swarm, probably "piece/chunk chooser" mechanism
- Adaptive parameters : interval(s), max torrent actively mined, chunk size(?)
A swarm is being helped by (relatively) limited number of helpers
(Abstract) Find a way Incentivize user to activate credit mining
(Optional) When target ratio reached, donate to "less-prioritized swarms".
(Optional) Automatic mining. Instead of user choose the sources.
(Optional) A feature to evaluate miners/swarms/mining effort rate
(Optional) Consider leecher's bandwidth when donating a swarm

Problem/Limitation/Doubt :

Persistence credit. Credit obtained from previous session can be used thereafter.
- Is it necessary? Is it follows the "standard"?
- If yes : how to store credit. If no : clean the storage, revert state.
- Outside the scope of credit mining, probably in multichain's scope
Multichain only hold bandwidth.
- How to incorporate used storage and "effort rate" in peer transactions/communication?
- partial seeding with complete storage need to incentivize as well
Is it possible to :
- mark that a piece/chunk is mined
- check a swarm being helped or not (get helper list from a particular swarm)
- when seeding, a node prioritize the one who helped him before (more than just t4t)
- map user ID/IP to credit?

Related work :

Decentralized credit mining in p2p system, Mihai(2015)
Towards a peer-to-peer bandwidth marketplace, Mihai (2014)
Investment Strategies for Credit-Based P2P Communities, Mihai(2013)
MultiChain: A cybercurrency for cooperation, Steffan (2015) - Msc Thesis

ardhipoetra commented 8 years ago

This is my current progress https://github.com/ardhipoetra/tribler/commits/credit_mining

synctext commented 8 years ago

Concrete ToDo:

Getting the credit mining GUI ready for experimental release
User selects channels to boost, algorithms selects the swarms then automatically
Minimize the friction for the user
Offer a 1-click content selector, minimize the detailed swarm info
Credit mining GUI provides a channel overview:
- Total Votecast records seen
- For each channel how many votes it got + latest content
- checkbox to easily enable boosting for that channel
- move the current detailed credit mining gui details to a button.

Next step: torrent_checker.py integration.

synctext commented 8 years ago

Initial GUI on Tribler home screen: 20160224_110104

Credit mining screen: 1000+ discovered channels, overall credit mining dashboard, channel status, channel content status. 20160224_112541

Thesis storyline:

intro: screenshots of Mihai proof-of-principle
problem description: Solve hit-and-run behavior (avoid tragedy of the commons, 1968 problem). Effortless donation of resources. Task: scheduling Bandwidth, storage, and relay. How do we balance between different donations for optimal network performance? Beyond Mihai: production-level code!
design: Discuss Mihai inner workings. Discuss Libtorrent share mode. Schedule and balance: relay bandwidth, own downloads/streams, speculative downloads, free storage space, and credit mining income.
implementation and experiments: Discuss 3-4 prototypes and experiments. Credit mining inputs: age of swarm, random scheduling, Similarity Function, and co-investor detection. First prototype: first GUI & first swarm boosting, use most popular channels, based on VoteCast, no relay yet. Second: use multi-chain to display earned credits in real-time, use persistent storage of multi-chain credit stats. Third: mature torrent checking code and fancy scheduling. Final: balance credit mining with storage & relay with a 1TByte storage server! Lot of performance evaluation graphs.

Next sprint: accurate investment information, torrent_checker.py integration, discovered swarms, downloaded torrents, DHT lookup failure, failed to download .torrent, used bandwidth for checking, swarms connected to, dead swarms, seeder/leecher info determined.

synctext commented 8 years ago

Next steps: understand swarm seed/leecher checking. Running code to bypass central tracker. Also problem to TCP check trackers, as tunnels only support UDP.

Goal: create a good scrape mechanism using DHT and UDP in existing messy Tribler code.

Some very old code is still using manual checking in our own Python code: https://github.com/Tribler/tribler/blob/v6.0/Tribler/Core/DecentralizedTracking/mainlineDHTChecker.py

Now we use Libtorrent, which has it's own DHT and another DHT library: pymdht.

whirm commented 8 years ago

The more recent versions of libtorrent have exposed more stuff to the python bindings. It would be nice if someone checked if the needed DHT stuff is available now. That would allow us to drop pymdht entirely.

synctext commented 8 years ago

Completed:

[X] First Gumby scenario file with credit mining of hard coded channels

Tasks for coming 3 weeks:

[ ] fix GUI wx3
[ ] runs flawless, stable for days/weeks
[ ] passes unit tests, code coverage
[ ] select channel and credit mine!
[ ] tracker checking UDP....
[ ] code review by @devos50

Thesis material credit

Credit mining, cost of evaluating a swarm is essential
similar to prospecting, what is a good investment ?
How many bytes does it cost, per swarm?
cost of DHT, .torrent collecting, PEX, and peer handshake to determine seeder/leecher.
example of low-level calculations like this: seeding one million swarms

Leftover work:

pyDHT for introduction points, LibtorrentDHT for the rest. ToDo: expose LibtorrentDHT for HiddenTunnelCommunity and removed pyDHT.

synctext commented 8 years ago

additional thesis material: cost of swarm checking in bytes yield of swarm investments coinvestor factor relay cost archiving cost eith disk space limits

synctext commented 8 years ago

working code, see issues above. Create a legal channel with 500+ legal swarms. Repeat the Mihai experiments. Show in Gumby on Jenkins with nice R plots.

synctext commented 8 years ago

Elric has 300 swarm and we have legal swarms from archive.org https://blog.archive.org/2012/08/07/over-1000000-torrents-of-downloadable-books-music-and-movies/ https://archive.org/details/moviesandfilms

Another legal source: http://terasaur.org/browse/category/video

synctext commented 8 years ago

Focus for 2 weeks: finish Problem Description and Related work (in 2 chapters or 1):

Flashcrowd scenario, press discussing potential replacement for Youtube services without servers. 10.000 new users/day for 1 week. Each donating 250 GByte. ResearchQuestion: make optimal use of donated resource.
Left-over bandwidth allocation. ResearchQuestion 2: what mechanism can ensure that CreditMining never disturbs the users primary activity of searching, downloading, and streaming.
How to seed 250GByte effectively, and anonymously !
Channels are the mechanism to bundle torrents, what channel to autoboost (e.g. investment decision)?
related work: Torcoin, lira, etc
related work: iterative Prisoners Dilemma, emergence of cooperation, collective action problem

ToDo:

create public repo for msc thesis
present latest credit mining results yield graph here and in Reddit.
describe 'Bittorrent Gold' prospecting algorithm: download the first 4 pieces of a swarm anonymously. Establish reliably the swarm size, with minimal automagic Libtorrent disk allocation.
understand credit mining implications for exit nodes and minimize their workload by preferring end-to-end encrypted anonymous seeds

ardhipoetra commented 8 years ago

Thesis latex repository : https://github.com/ardhipoetra/msc-thesis-creditmining

synctext commented 8 years ago

Create:

1. introduction (file sharing, p2p, upload problem,Bittorrent, tit-for-tat, Tribler, allchannel&search community table, multichain and upload credits)
1. Problem Description (credit mining, good investment, related work: Lira&Torcoin&Bitcoin..,P2P economics, sharing ratio enforcement, etc.)
1. System design (picture, explain N piece prospecting, etc.)

Open PR #2398, #2351

October steps: experiments with prospecting: select channel, DHT responsiveness, time to download download first 4 pieces, discovered swarm size (PEX,seed/leech stats). design & add features to core API: ? prospect_swarm(SHA1) ? Plot for 10000 swarms: DHT success, 4 piece time, swarm size stats.

ardhipoetra commented 8 years ago

Problem description revision :

2.5 : how to prospect a good investment how to detect investment, what is the properties of that

2.6 : when to delete a downloaded swarm and replace it with a new investment if projected yield is more than enough - cost of download

synctext commented 8 years ago

Thesis feedback :

Explain clearly what help means, include picture
Explain Multichain in details in chapter one, include picture
1.4.2 Supply and Demand, nobody in thesis committee will understand this.
quote [28] : The results for TVTorrents show that after about 2 hours, virtually all of the data comes from seeders. Apparently, tit-for-tat is almost irrelevant in such communities.
The prototype of the framework was conducted by Capota et al., mainly to run credit mining system without any restriction or coordination with any client. Move to later section, rephrase "our work is based on an earlier emulation and simulation work by Capota et.al. from 2010 till 2013".
2.1 Cooperation and performance in BitTorrent, duplicate contents
prospecting: include details, magnet --> DHT --> (BEP9](http://www.bittorrent.org/beps/bep_0009.html) -2nd DHT lookup? - PEX - peer connections - determine swarm size.
first explain prospecting, then PEX,DHT.

ToDo: pull requests ! Prospecting experiment

ardhipoetra commented 8 years ago

It has been processed in https://github.com/ardhipoetra/msc-thesis-creditmining/commit/04ce285dbc0aa69bb6702e9c0d827431dff5a2e1.

For prospecting details (how it works), I want to put that on 'design' chapter.

ardhipoetra commented 8 years ago

Downloading 4 pieces of 10k+ torrents in 12 hours with 500 torrent connection in one time, timeout is 1 hour, resulting in below :

dist

histo

ardhipoetra commented 8 years ago

Gumby repository, with current experiments : https://github.com/ardhipoetra/gumby/tree/credit_mining_exp/experiments/credit_mining

synctext commented 8 years ago

Idea for final experiment (after discussing prospecting) :

Channel with lot of torrents
automatically detect underseeded torrents
credit mining manager boosts unavaiable swarms
All seeding capacity is evenly distributed
Outcome: all swarms have equal number of seeders, poorly seeded swarms no longer exist.

Next sprint:

prospecting on Gumby
thesis design chapter .tex

With progress: create thesis committee

synctext commented 8 years ago

idea for 1st experiment of thesis.

This first experiment is designed to evaluate the basic operation, correctness, responsiveness, and efficiency of our work. It is specifically crafted to be simple and easy to understand plus debug.

1 channel, add torrent, add 1 credit miner. Add second torrent, measure how fast it is mined..

synctext commented 7 years ago

@ardhipoetra @pimveldhuisen @Captain-Coder Funny related work, duplicate of the whole Tribler idea "Steem: An incentivized, blockchain-based social media platform". However, it's is implemented as a simple Reddit clone, where your karma equals website-ownership and can be converted in cash.

ardhipoetra commented 7 years ago

Related to credit mining, bittorrent and utorrent launched "altruistic mode" which is very similar in credit mining. The concept is exactly the same with libtorrent share mode that we use.

synctext commented 7 years ago

Latest direct thesis .pdf download link: https://github.com/ardhipoetra/msc-thesis-creditmining/blob/master/thesis.pdf

synctext commented 7 years ago

Open PR in Gumby https://github.com/Tribler/gumby/pull/282 Tribler7 status: ToDo GUI, stability?

Current thesis version, 15Dec'16, with a 13-page introduction section. Explaining, freeriding, credits, private communities, and oversupply. thesis (43).pdf

ToDo: first paragraph of chapter 2, please add "autonomous operation". The goal of this work is to create a European Youtube type of system, without any server or central authority. Similar to Youtube in usability and ease-of-use, Bittorrent for robust streaming, but without the explicit file management and explicit seeding settings. The goal of this research is to devise an automatic caching layer which ensures content availability for both long-lived years-old and freshly created content mere seconds ago. Current 2.1 and 2.2 have too much overlap with Section 1. Please identify and delete all overlap. Move all related work and citations to a dedicated section. All stuff like "Leon et al. uses BitTorrent protocol to increase user download speed and at the same time reduce datacenters load." Section 2.4.1 needs to move to either 1 or 3, too much engineering details for 2.

Section 2.1 Challenges in creating credit mechanisms Explain that all prior stuff never works at scale. Lot of ideas, none proven to work. Section 2.2 add to 2.5 Storage capacity is limited and therefore we require a mechanism to optimize it's usage.. etc.

Add to section 2 the problem of "Co-Investors", when everybody sees the same swarms and does not see others are already boosting them. Thus the need for diversity and awareness of other credit miners.

Please rename: Section 3 Credit Mining System Design 3.2 Credit Mining Architecture, Section 4 Prospecting Algorithm New Section "Credit mining implementation and setup" 4.3 Credit mining experiment, please rename into experimental setup 6.1 "Live prospecting experiments"

unit tests, + code coverage table in Implementation section

The following experiment is specifically designed to be simple and validate all core components and algorithms of our credit mining research. All conditions are controlled and do not rely on external elements such as trackers or DHT nodes. Our first validation experiment tests the basic credit mining swarm selection algorithm. We create a dozen swarms, all a different number of seeders. A single credit miner then starts prospecting these swarms and should select the most underseeded swarm, yielding the best return. Our next validation experiment repeats the dozen swarm experiment with multiple credit miners. We show that all prospecting opportunities are found. This proves our algorithm is both effective and efficient.

With the following experiment we determine the costs, trade-offs and performance around the observation period of a swarm and the amount fo discovered peers.

ardhipoetra commented 7 years ago

Section 2.4.1 needs to move to either 1 or 3, too much engineering details for 2.

Moved to design chapter

Add to section 2 the problem of "Co-Investors", when everybody sees the same swarms and does not see others are already boosting them. Thus the need for diversity and awareness of other credit miners.

Added to future works as it is not possible right now by using current libtorrent.

Section 2.4.1 needs to move to either 1 or 3, too much engineering details for 2. Section 2.1 Challenges in creating credit mechanisms Explain that all prior stuff never works at scale. Lot of ideas, none proven to work. Section 2.2 add to 2.5 Storage capacity is limited and therefore we require a mechanism to optimize it's usage.. etc. Please rename: Section 3 Credit Mining System Design 3.2 Credit Mining Architecture, Section 4 Prospecting Algorithm New Section "Credit mining implementation and setup" 4.3 Credit mining experiment, please rename into experimental setup 6.1 "Live prospecting experiments"

The structure is now like this : 2. Problem Description (7 pages) 2.1 Challenges in creating credit mechanisms 2.1.1 Gain return and performance by spending credit 2.2 Prior Credit Mining Research 2.3 Prospecting good investment (Research Question 1) 2.3.1 Credit mining as investment tool 2.4 Substituting investment cache (Research Question 2) 3. Credit Mining System Design (11 pages) 3.1 Libtorrent 3.1.1 Priority Management 3.1.2 Share Mode 3.1.3 Peer Discovery 3.2 Credit Mining Architecture 3.2.1 Mining Sources 3.2.2 Resource Optimization 4. Prospecting Algorithm (7 pages) 4.1 Early prospecting stage 4.2 Live prospecting 4.2.1 Swarm Selection 4.2.2 Scoring Policy 4.2.3 Swarm incitation 5. Credit mining implementation and setup

Current 2.1 and 2.2 have too much overlap with Section 1. Please identify and delete all overlap. Move all related work and citations to a dedicated section. All stuff like "Leon et al. uses BitTorrent protocol to increase user download speed and at the same time reduce datacenters load."

I deleted some of the overlap, but the Leon one apparently not one of them.

ardhipoetra commented 7 years ago

I'm thinking to remove the secure currency section (+- 2 pages, chapter 1) along with multichain, LIRA, and TorCoin. The reason is, there is no implementation nor experiment regarding those secure currency alternatives.

synctext commented 7 years ago

good point. possible Subsection:

--- Currency versus secure karma points --- bitcoin has proven to be secure. explain the different goals and requirements for an incentive system and a currency.

synctext commented 7 years ago

83 pages: thesis (45).pdf "5.3.4 Experiment on user activity" focus on setup, this chapter does not yet discuss graphs. "5.4 Finding best parameters", more text for experimental section Please add details: 13000 torrent are downloaded for 1 hour each in batches of 500. This is a problem, try adding 1 per 7 seconds.

59%(8176) are OK when only downloading first 4 pieces, a mere 5%(668) for rarest first. Investigate what goes wrong with those 7508 swarms.

Explain everything at a deeper level, for instance:

corrupt .torrent
no response from DHT
Got DHT response, but failed to get response from any peer
got peer response, but no pieces.
got pieces, but not all 4 requested pieces.

"6.2 Comparison vs old", Performance compared to prior work

Experimental section:

present first the two simple validation experiments, described above (keep filesize the same).
Mihai compare (just 24h graph)
swarm boosting experiment (validate that credit mining improves performance). Try to show a graph with performance improvements (with and without credit mining).

synctext commented 7 years ago

Comments on this thesis version Please focus on readability and understandability of results section. Part-time: API in the core + swarm stats + credit earning stats. ToDo: compact the writing, remove double explanations. It's possible to cut 30 pages, without loss of depth! Clarify: 4.2.3 Swarm incitation

"Short torrent 26%(3690)" please rename/remove and clarify this concept. 3 strategies with comparable results (Rarest first, sequential and random). Discuss in either 1 section of 3, but 2 sections feels strange.

"6.2 Choosing the best swarm". Is that prospecting, swarm rotation or removal ? "Therefore, the seize up condition on the swarm occurred." please clarify.

ToDo: to test the whole credit mining engine for stability and functionality. Parse a bunch of RSS-based torrents and test the yield.

<item>
<title>My Linux Distro</title>
<link>
https://Helper-torrents.org/download.php/666000666/Ubuntu-Arch-Debian-16.10.torrent?torrent_pass=73d84e8686383954642a6f172ef
</link>
<pubDate>Wed, 25 Jan 2017 12:00:00 +0000</pubDate>
<description>696 MB; Software/ISO/PUBLIC</description>
</item>

ardhipoetra commented 7 years ago

Come up with names : early prospecting stage -> probing stage live prospecting stage -> mining stage

Swarm incitement -> ~instigate or~ stimulate?

Restructuring chapter 6, now : 6.1 Validating the credit mining system 6.1.1 What the policies choose 6.1.2 Obtained gain by the selection 6.2 Comparing obtained gain with prior work (compared with mihai's work) 6.3 Predownload hit experiment 6.4 Sustaining user experience on downloading 6.5 Swarm performance with credit mining 6.5.1 Varying the number of credit miners 6.5.2 The effect of swarm stimulation

Removed pie chart, and use stacked bar graph instead.

ardhipoetra commented 7 years ago

Changed a lot in thesis.pdf, reduced from 83 pages to 69 with more content. Commit id : https://github.com/ardhipoetra/msc-thesis-creditmining/commit/7f02c62374839403f3255b808352cf55e420b0b6

Come up with title :

Autonomous prospecting and investing mechanism in distributed system
Leveraging currency in BitTorrent : Prospecting and Investment in Credit Mining
Credit mining : an autonomous prospect and investment system

synctext commented 7 years ago

Credits and blockchains in Bittorrent: designing prospecting and investments functions Comments:

With millions of swarms, how can swarm blacklisting scale? (Section 3.2.3)
The prospecting module can be divided into two main stage, named probing and mining stage. Please do not invert the meaning of terms. Mining can not be part of prospecting. The former is about bringing in the gold, the latter finding gold.
prospecting, probing, pre-downloading, mining. Readers will get confused, just use: prospecting and mining.
Algorithm 3: Finding rare pieces procedures if this is faster then all known algorithms then keep it, otherwise remove.
Algorithm 4 Peer translation algorithm, remove and explain in max. 4 lines.
The number of swarm for a single source should not be limited. unclear.

Thesis is still highly redundant. For instance, please remove all duplicate explanations and compact the text below to 1/3 of current size.

The selection need to be run periodically because swarm information is constantly
changing. At some point, previously selected swarm may not beneficial to mine for
both the miners and the community. Eventually, it has to be substituted by another
swarm. Swarm selection process is controlled by the applied policy. The policy
contains rules to sort which swarm to start and stop.
Although it is possible for user to create their own policy, three policies have
been defined on the preliminary work [26]. Those are based on random, swarm
age, and seeder ratio. Random policy mostly used as the baseline of the experiment.
Swarm Age policy is claimed better than random policy. It selects the swarm based
on its age. New published swarm known to have higher demand than other type
of swarm [10]. With a lot of peer in early swarm, it will increase the chance to
get more credit. Seeder Ratio policy selects the swarm that has lower number of
seeder relative to all the peers participated in this swarm. This policy is specialized
to help undersupplied swarm.
4.2.2 Scoring Policy
In this thesis, we intend to extend those policies so it can be highly customizable
and balance credit gaining and helping undersupplied swarm. We propose a policy
named scoring policy to tackle this issue. Scoring policy was brought up with
seeder ratio policy as its base. It offers a method to quantify swarm prospect and
to reduce possible identical result from two swarms or more. It can be customized
with its score multiplier. The idea behind this policy is to give higher score for
swarm which has lower seeder ratio, has more peers, lower piece availability, and
many activities detected in the swarm. Higher score means getting higher priority
to be mined.
Scoring policy consists of several features. First is the seeder ratio. As mentioned
before, it is useful to measure whether a swarm is undersupplied or not. This
is particularly useful to help a swarm. Lower seeder ratio should get higher priority
in mining. Second is the number of peer in a swarm. In small swarm, seeder ratio
may not relevant because there is not any significant difference between one and
another. In a case where ratio of seeder and peers is almost equal for all swarms,
it is useful to target large swarms instead of the small one as there are comparably
more options to give our pieces to. Third aspect is the file availability of the
swarm. In flashcrowd case swarm, there are a significant amount of peers who do
not have a large portion of the file. Comparing this to another swarm which most
of the leecher almost finished their download, mining flashcrowd swarm will give
higher credit and more beneficial to the community altogether. Last element that
we considered is the activity of a swarm. It is preferable to mine a swarm whose
some of the peers already download or upload any data from or to us. Moreover,
one which had faster speed is considered. The worst case happened when there
is not any peer interested with the piece we already have. In the other hand, if a
peer already interacted with us before, there is a chance we will get chosen again
in the future. This feature incentivize swarm which had interacted with the miners
previously.

A simple validation experiment should be so simple that a reader can write down the outcome in a graph. Ideally it is deterministic, no room for a different outcome. Every measurement point defined in the text. Thus swarm A and B should be sufficient. No downloaders, no upload yet, just 1 or 2 seeders. Reader can deduce correctness.
Do not add jitter to measurement outcomes to prevent measurements point overlap.
Why this experiment? For each evaluation experiment: mention the behavior you are testing, what is the expected outcome, and compare to real outcome (can be just a few lines of text).
Please design a most-minimal yield validation experiment. So 1 file size.

ardhipoetra commented 7 years ago

Scalability experiment. 1000 swarms with each 2 seeders. 1% has 1 seeder. Measure how fast credit mining can find those swarms. Downloader is not necessary (not a yield experiment)

Response:

With millions of swarms, how can swarm blacklisting scale? (Section 3.2.3)

Swarm blacklisting just maintain the infohash index. In million swarms, it depends on the system's available memory. (It doesn't need to be very high though)

The prospecting module can be divided into two main stage, named probing and mining stage. Please do not invert the meaning of terms. Mining can not be part of prospecting. The former is about bringing in the gold, the latter finding gold. prospecting, probing, pre-downloading, mining. Readers will get confused, just use: prospecting and mining.

Prospecting module changed into Investing algorithm/module Probing (also predownloading) and Mining changed into prospecting and mining reason : it's weird to have 'prospecting stage' inside 'prospecting algorithm'. Also, as you said, 'mining' should not part of 'prospecting'

Algorithm 3: Finding rare pieces procedures if this is faster then all known algorithms then keep it, otherwise remove.

removed

Algorithm 4 Peer translation algorithm, remove and explain in max. 4 lines.

It is hard to explain without showing the algorithm because it has many if-else conditions. I cut the explanation instead to be more concise instead.

For instance, please remove all duplicate explanations and compact the text below to 1/3 of current size.

changed into :

Three policies have been defined on the preliminary work [26]. Those are based
on Random policy, Swarm Age policy, and Seeder Ratio policy. Specifically for
Seeder Ratio policy, it is specialized to help undersupplied swarm and gave the
best gain credit out of three [26].
4.2.2
Scoring Policy
Scoring policy is a proposed method to quantify swarm prospect and to reduce possible iden-
tical result from two swarms or more. It was brought up with seeder ratio policy
as its base. It can be customized with its score multiplier.
Scoring policy consists of several features. First is the seeder ratio which defined
as the ratio of seeder and total number of peers. Second is the availability of a
swarm. It represents the piece shortage which shows the potential to gain more
credit in longer term. Third is the number of peers as a tie-breaker. It is useful to
target large swarms instead of the small one as there are comparably more options
to give our pieces to. Lastly, it is the swarm recent activity. Inspired by tit-for-tat,
the policy will prefer a swarm which any of its peers had interacted with the miners
previously.

Do not add jitter to measurement outcomes to prevent measurements point overlap.

Change the plot in 6.2 into density plot. Drawback, lost which swarms that selected at a particular time. Key points : credit mining selects underseeded swarm screenshot from 2017-02-17 21-41-04

A simple validation experiment..... Please design a most-minimal yield validation experiment. So 1 file size.

Result: screenshot from 2017-02-22 10-30-40

I stil need to run the scalability test, but for now, all your feedbacks have been processed. Any thought, @synctext? Updated thesis.pdf already in the repository.

ardhipoetra commented 7 years ago

Update on scalability experiment: I decided to put downloader to measure the prospecting accuracy as well. With 1000 swarms (each have 3 peers), the miners couldn't get some torrent metadata from the channel. The downloader also sometimes couldn't find the swarm that it wanted to download.

Reducing to 100 and everything okay. I'm trying the 500 swarm now.

The goal is still the same : how fast the credit miner can prospect, then find underseeded swarms.

UPDATE: 500 swarms gives me the same behavior. Moreover, even when the downloaders have finished their downloading, only ~250 out of 500 swarms discovered by the miner. Will focus on the 1% and 5% of 100 swarms instead.

synctext commented 7 years ago

Comments:

spelling needs to be improved
Chapter 3 explains term like connected seeders, missing pieces, and rarest rarity. First page of Chapter 4 does not build on this knowledge, goes back to "limited and unverified information of the newly discovered swarm".
Page 25 "trying to get as much information as possible of the swarm".
Chapter 4: directly explain, for instance, the cardinal problem is obtaining reliable swarm size and piece availability information. With this information we can predict the future demand of this swarm content and estimate our return-on-investment. The key method presented in this chapter is called prospecting, simply downloading a few pieces of a swarm to roughly determine the number of peers in this swarm and their download progress. Prospecting cost per swarm is limited, allowing us to estimate and compare the return-on-investment of a few thousand swarms easily.
stuck swarms are swarms in which certain rare pieces are not available for download, even after trying for an hour. Stuck swarms have proven to occur frequently in the wild, according to our samples.
Figure 5.3 unreadable. Just 5.3b without the directory popup, and "add boosting source" popup displayed in lower right corner.
Chapter 6: we now evaluate the performance of our credit mining system with both synthetic and real-world swarms. We start with simple and easy to understand experiments with predictable outcomes to validate the correctness of our work. Then we increase the complexity of our experiments in several steps towards evaluation within the real Internet environment.
6.2 : channel is published with two swarms. We subscribe the miner to this channel at the start of this experiment.
- annotate timeline with key events
- discovery by miner of channel, obtains the hash of channels (ChannelCast mesg), requests number of peers to DHT or tracker, etc.
6.3: explain goal and determine used bandwidth (exactly how many messages)
6.4 what is the goal in the first lines. "In the following experiments, the mining stage as part of our investing algorithm is evaluated. We focus on the scoring policy that used in swarm selection. Two insights will be presented."
Figure 6.7 discrete number of peers...

ardhipoetra commented 7 years ago

Chapter 3 explains term............. First page of Chapter 4 does not build on this knowledge, goes back to "limited and unverified information of the newly discovered swarm". Page 25 "trying to get as much information as possible of the swarm". Chapter 4: directly explain, for instance, the cardinal problem................

First two paragraph in that particular part :

Investing a swarm is one of the key elements of credit mining system. The investing
module can be divided into two main stage, named prospecting and mining stage.
After new swarm has been discovered, the credit mining system will estimate its
return-of-investment potential on the prospecting stage. Then in the mining stage,
the system will selectively join some of the high-prospected swarms to gain credits.
The combination of both stages builds the whole investing algorithm.

4.1 Prospecting stage

The cardinal problem of knowing swarm’s potential is to obtain reliable swarm size
and piece availability information. Those information are essential to predict future
demand and estimate its return of investment. Given a large number of swarms,
joining all of them is costly. Therefore, we introduce the prospecting algorithm: a
means to download few pieces to roughly determine peers and pieces information.
Prospecting cost per swarm is limited, allowing us to estimate and compare the
return-on-investment of a few thousand swarms easily.

Figure 5.3 unreadable. Just 5.3b without the directory popup, and "add boosting source" popup displayed in lower right corner.

Removed 5.3a, moved the popup to lower middle (to cover some of the channels' name)

Chapter 6: we now evaluate the performance.......

Adapted.

6.2 : channel is published...........

Added. Couldn't annotate the graph without overlapping annotation because they are too close to each other. I will put the explanation in the text instead. Also, its reasoning. Moreover, this experiment didn't use tracker and DHT.

6.3: explain goal.......

Key in prospecting is the ability to find a swarm that likely to give high investment
return. In the following section, we will asses how well our prospecting algorithm
can discard low-potential swarms on the Internet. Furthermore, we will also mea-
sure how fast and accurate the prospecting to find the high-potential swarms.

6.3: .... determine used bandwidth (exactly how many messages)

Unfortunately this is not possible, the logs I have was not recording the used bandwidth and number of sent messages. I've put this in the future works, though.

6.4 what is the goal in the first lines. "In the following experiments, the mining stage...........

In the following experiments, the mining stage as part of our investing algorithm is
evaluated. We focus on the scoring policy that used in swarm selection. We wish
to show that this policy selects the undersupplied swarms correctly. Furthermore,
the mining performance from this selection will be evaluated. We will confirm that
our policy will result a positive upload gain and ratio.

Figure 6.7 discrete number of peers...

Fixed.

synctext commented 7 years ago

thesis (55).pdf

ardhipoetra commented 7 years ago

Figure 6.7 plot discrete

change into: screenshot from 2017-03-13 15-06-26

enlarge Figure 6.4 and 6.5

done. For figure 6.5, I change it a bit to this :

screenshot from 2017-03-13 17-30-50

Chapter 3, 1st and 2nd paragraph now rewritten as :

In this thesis, we introduce a “Credit mining system”, an automatic investment frame-
work that considers multiple properties of the swarms. With the credit mining
system, a user can both gain credit and improve swarm performance with limited
bandwidth allocation without any intervention necessary.

Figure 6.11. Make it human readable. Mining on/off

Move the legend outside the box to (vertical lines and boxes are explained. Slightly more complex but needed for clarity): screenshot from 2017-03-14 18-41-37

Mention experiment time in 6.5 Clarify the experiment in 6.6

Done

ardhipoetra commented 7 years ago

@synctext, I updated the polished version (appendix included) in my repository : https://github.com/ardhipoetra/msc-thesis-creditmining/blob/master/thesis.pdf

Probably the final version before last cleanup.

synctext commented 7 years ago

OK Thesis work by @ardhipoetra now finished. New student @EinNarr assigned to this project.

Focus for thesis (draft) : Blockchain-based credits for video streaming using embedded devices. Using Kodi-to-Kodi networking, without any servers and TrustChain work. Create a Youtube-like experience without servers and using private tracker oversupply effect.

First task, create unit tests for all components of existing credit mining code. Obtain decent code coverage and learn the code+architecture inside out. Step for next months to improve the naive algorithms... Note that relaying is more expensive than seeding. How to balance relaying with seeding? "optimal resource usage scheduling"

synctext commented 7 years ago

current status:

investigated the credit mining code
ToDo: Twisted in Tribler tutorial
ToDo: first iteration proper unit testing
ToDo: trim the fat, remove features like RSS input. Just channel-mining only.

synctext commented 7 years ago

Please do not start over from scratch with credit mining. This thesis builds upon: Kodi code, Ardi code, community-size code.

synctext commented 7 years ago

Alternative storyline beyond credit mining: https://repository.tudelft.nl/islandora/object/uuid%3Aa1b443c7-8b37-4263-ae96-d38bc8b8f397 "Decentralized Autonomous Entity", [ref]

devos50 commented 7 years ago

An example of how to create Tribler sessions in the tests: https://github.com/Tribler/tribler/blob/devel/Tribler/Test/Community/Market/Integration/test_market_base.py#L168

Another example of multiple Tribler session communicating with each other (with libtorrent enabled): https://github.com/Tribler/tribler/blob/devel/Tribler/Test/Community/Tunnel/FullSession/test_tunnel_base.py

synctext commented 7 years ago

Please focus on Nosetest credit mining work. As simple as possible: create 1 random swarm, seed, create channel with that single swarm, start credit mining, now 2 seeders, happy!

Thesis storyline; run within Android Kodi; embedded credit mining.

synctext commented 7 years ago

Thesis outline options:

Problem description
- obtaining credit on an embedded platform (sub-100Euro)
- effortless boost unhealthy swarms, thus increasing the download speed for leechers
- collecting "credit" so "helper" can get higher download speed
- maximize bandwidth utilization of a user/node
Credit mining design and implementation (beyond Ardhi stand-alone non-working code)
- policy problems and improvements
- code coverage
- timeouts and dead swarm handling
- integration testing
Kodi embedded platform
Experiments and performance analysis Validation experiment. As simple as possible: create 1 random swarm, seed, create channel with that single swarm, start credit mining, now 2 seeders, happy! Moderate experiment: create a channel with 10-ish swarms. Still simple low and high seeders (1 or 2). Selects underseeded and then eventually all 10 swarms. Advanced experiment: use all storage of a cheapo sub-100Euro device and show that it works.
DONE

EinNarr commented 7 years ago

@devos50 Hi, could you please kindly give me some peer view on the unit test I wrote? https://github.com/EinNarr/tribler/blob/creditmining_slimming/Tribler/Test/Core/CreditMining/test_boostingmanager.py

synctext commented 7 years ago

Next steps are finishing unit tests and rebase. Then conduct the key thesis experiment described above and generate experimental results chapter content.

synctext commented 7 years ago

Mildly optimistic progress. The 1 seeder 1 swarm experiment could be operational in 1 week. Then the 10-ish swarm version. Finally the cheapo-device test and thesis graph plotting. Plus open Kodi binding issues.

synctext commented 7 years ago

Worked enabling one laptop with multiple instances, multiple state-directories. Fastest route to 1 seeder 1 swarm experiment? Use multiple Bittorrent client instances, not multiple Triblers. Either fix multiple instances or put channel inside database (or create own channel), start mining.

Include technical depth in credit mining thesis! Related thesis work $256 million bubble, Filecoin research roadmap 2017 also #3097 and Ycombinator comments Test with 10 new servers?

Tribler / tribler

Operational credit mining branch, Gumby scenario and multichain #1842

Thesis Proposal/Roadmap v.0.3