Operational credit mining branch, Gumby scenario and multichain

synctext commented 8 years ago

Get a working performance analysis environment. Goal: ability to make pretty graphs locally with both credit mining and multichain (two different branches). Task: make an Ubuntu channel or other large validated legal collection, with actual "from the wild" downloads. This environment will be expanded over coming months.

part of #23. ToDo: read "tragedy of the commons" paper, 1968 problem.

Strategy: Till end of Feb focus on credit mining, then include relaying bandwidth to avoid too much complexity at the start. Next sprint: co-investor detection?

EinNarr commented 7 years ago

The simplest 1 seeder 1 swarm experiment is working well. start_seeding_in_share_mode However, things don't go well when extending it to 10 torrents in a same channel. The design should allow user to specify a max number of torrent being seeded at the same time. There are multiple logical mistake in the code causing this problem:

The torrent pool, from which the torrents to be enabled for the next iteration, do not contain any of the torrents currently being mined. This means that although we specify the max number, all active torrents are all exempted, making this max number the max number of torrents added to the boosting manager each iteration, but not what it was supposed to be. This was fixed already during writing the unit test.
When a torrent is decided to be started by the policy, it also requires that it doesn't exist in the session. This is working for the 1st run. But if the torrent is somehow stopped by policy but later decided to be re-enabled, the line to start the download is simply skipped.
In the function to start a torrent in boosting_manager. If session.download_exists() returns true for a hash code, it regards this hash as "being downloaded", and raise a "Torrent already being downloaded" error and return. However a paused torrent would also return true in session.download_exists() and need to be restarted.

Additionally, there are 2 sets of default configurations for credit_mining. One set in global default configuration file, the other in the initialization of BoostingSettings class. These are of different values, making it a bit confusing.

Bugs are fixed and now it works well in 1 channel 10 swarms test with random policy and maximum 5 active swarms. screenshot from 2017-10-17 14-06-04

The scrape_tracker function is disabled for now. It is basically calling Torrent_Checker to check the swarm size by requesting the tracker. However in the implementation of Torrent_Checker, it tries to create a new task in the libtorrent session, which will cause an unhandled error when this torrent is already added to download. Also, the function of this tracker scraping action is not clear to me. We have already got pre_download function to determine the swarm size(however it is also either buggy or not reliable), why we still need to scrape the track after the torrent is already being downloaded? And it should not be working on DHT. Both pre_download and scrape_track are making Credit Mining over complex. Ideally it should directly take the swarm size information from Tribler core instead of registering a lot of callback chains and looping calls to get it by itself. Let's Mr. Ma's swarm size function is quite a savior for making Credit Mining great again.

synctext commented 7 years ago

Assume: just as popular as Bitcoin mining and as much talked about as FileCoin.

Next meeting: what are the thesis experiments (Kodi-to-Kodi, credit mining servers, or superior policies)?

Current issues with the code:

pre-download is already in credit mining, rename to "swarm size check" and clean-up
remove complex callback chain
fix unstable and "crash" of code !
How is the handling of dead torrents?
uses a wild unreliable tracker for discovery in local-run single-machine experiments
uses Libtorrent share mode, that requires multiple leechers to functions. DOCS: " Share mode indicates that we are not interested in downloading the torrent, but merely want to improve our share ratio (i.e. increase it)."
use Libtorrent in normal non-share mode, thus active caching of numerous popular files (rarest first, requires storage management).
What happens with "1" setting of _share_modetarget

EinNarr commented 7 years ago

Log 24/10/2017: I just had a discussion on seeding polices with @synctext . @synctest suggested that we should try to download and seed several pieces for all the torrents, and pick the best performing ones. With no measurement on the real performance as for this policy, by impression, I would say it is not good idea. It will take extremely long time to user, especially a private user to get a full traversal through all the torrents. Even for the "enterprise users", since the "testing time" for each torrent need to be quite short, the result will have much noise caused by connection problems. Also, since a single traversal will take so long time, it will not be timeliness. Also having a long swarm selection interval is against Mihai's experiment result.

My idea of implementing the new policy is that, we do more measurements offline, without adding more trouble to the users and Tribler code itself. My plan of getting the policy is as follows:

Collect a number of torrents.
Do the download and seeding test on every single torrent, for a certain amount of time, with certain limit on upload and download bandwidth.(not in Tribler)
Record the data of these tests, including the speed, total amount of data uploaded and downloaded, swarm information that we think could have impact on the value of the swarm.
Feed the data we collected to a model (currently I am thinking of ANN), magically get the formula of determining the value of the swarm, without thinking of how much weight would each factor in a swarm wound contribute to its value.
Put that formula into Creditmining policy, without adding too many lines to our 150k-lines-of-code Tribler.

synctext commented 7 years ago

OK. Please focus on a realistic and simple credit mining setting: manage a channel with 1000 swarms.

Please focus on maximizing the upload for 1000 swarms and not think at all about 10k swarm channels in this thesis work. One new policy to evaluate first: no-model-policy. Test for 24 hours for performance. Approach:

Cycle through 1000 swarms and only seed X at a time (optimize parameter X using experiments) {replacement strategy}
Piece strategy
- Download the first 1-3 pieces of each swarm, keep track of dead swarms.
- Download using standard rarest-first policy
Try uploading in each swarm for 300 seconds (parameter Y=300seconds). {duration strategy}
After 300 seconds, remove all swarm without upload and 10% worst uploading swarms (parameter Z=10%)
Or continuous removal policy: every 10 seconds check all swarms seeded for over 300 seconds and remove worst performing on. {removal strategy}

Methodology: estimate 3 good values for parameters X,Y, Z. Test performance for each parameter in an separate experiment.

Other policies to compare:

Random select
Highest-leechers first
seeder leecher ratio
aging fresh-first policy
Ardhi scoring function with estimated weights

synctext commented 7 years ago

New related work to describe: Orchid: a new surveillance-free layer on top of the existing Internet

First implementation ready of no-model-policy, still randomly crashes. Working with @xoriole for dedicated Jenkins task to execute the test from the credit-mining Git branch. Create own channel, fill with 1000 random swarms. Possible approach: first do single measurement, then code cleanup and do thesis measurements. Problem: we can remove ineffective policies and cleaner code with a single policy.

ToDo: post seeding in time of 10 hand-picked swarms using no-model-policy from laptop

synctext commented 7 years ago

Has spent a week on getting a test channel of 2000 torrent files (not merely magnet links). Might do extensive cleanup of all policies which pollute the architecture with pre-downloading and other unneeded complexity? Would be easier to fix and get stable? ToDo:

get the cardinal thesis experiment of credit mining operational
create the essential figure for thesis: upload yield
write first thesis chapters: performance analysis + credit mining algorithms
- model-based algorithms
- no-model approach, simple and effective
- hybrid might be even better

synctext commented 6 years ago

Progress update: works! downloading many GBytes and uploading few MBytes. Please push your latest branch today...

synctext commented 6 years ago

Try out a new branch with even more extreme idea then no-model-policy called "zero-thinking-policy". It leaves everything to Libtorrent. Key scientific challenge is doing performance analysis with the wealth of possible parameter settings. All tested in the wild. Try with 250 swarms at first. Understand dead-swarm problem. Try to preserve start_download() of Tribler API, trivial to add anonymous downloads.

seeding a million torrents with bittorrent

libtorrent supports setting separate limits for different announce methods. That is, making the x top 
torrents announce to trackers, the y top torrents announce to the DHT, the z top torrents announce to
 local peer discovery. Using this feature, all torrents could always be running, just not necessarily 
announce.
Instead of scraping torrents to find their rank, just announce to them instead.
This would mean all torrents always allows peer connections. All torrents are not necessarily announcing 
to their trackers every 30 minutes, but do announce every now and then, round-robin with all the other 
torrents that aren’t announcing regularly.
Never stopping a torrent might significantly improve availability and longevity of content in bittorrent 
networks.

plus the default simple seeding rotation setting

Auto managed seeding torrents are rotated, so that all of them are allocated a fair amount of seeding. 
Torrents with fewer completed seed cycles are prioritized for seeding. A seed cycle is completed when a
 torrent meets either the share ratio limit (uploaded bytes / downloaded bytes), the share time ratio (time 
seeding / time downloaing) or seed time limit (time seeded).
The relevant settings to control these limits are share_ratio_limit, seed_time_ratio_limit and 
seed_time_limit in session_settings.

synctext commented 6 years ago

After this thesis project: disk management (download to max, selectively replace)

synctext commented 6 years ago

ongoing big cleanup. One week left is estimated.
no push of new code to the central repo for the past 15 days
remove archive mode
plan: try with 250 top swarms
promise: no scraping
use unit tests during development
repeating thesis outline draft
- [ ] Experiments and performance analysis
- [ ] Validation experiment. As simple as possible: create 1 random swarm, seed, create channel with that single swarm, start credit mining, now 2 seeders, happy!
- [ ] Moderate experiment: create a channel with 10-ish swarms. Still simple low and high seeders (1 or 2). Selects underseeded and then eventually all 10 swarms.
- [ ] Advanced experiment: use all storage of a cheapo sub-100Euro device and show that it works.
- [ ] DONE

EinNarr commented 6 years ago

The new code pushed. https://github.com/EinNarr/tribler/tree/CreditMining_operation

Main change log: Removed the scoring policy. Removed the pre-download function Better OOP code style Simplified callback chains. Replace the "looping registered looping calls" with a state callback Replace the mysterious torrent dictionary with a dictionary of torrent objects Change the way policy is defined to make it more modular

@egbertbouman @devos50 Could you please help me a bit review the code when you are free?

EinNarr commented 6 years ago

What is not yet done with the code: The code is not 100% bug free and there is still debugging loggers within. The unit test seriously outdated The policy is not yet efficient If time is enough, a simple graphic indicator for CM and a flag to differ and protect the mining task with normal download task.

Concern: Tribler session reports upload/download speed with the "raw total speed" instead of only payload speed. Would payload speed be more accurate in the cases?

synctext commented 6 years ago

good to see your progress! please focus on the 3 thesis experiments and stability of the code.

EinNarr commented 6 years ago

Tested with top 100 torrents (99 actually). The test lasted around 1260 seconds, with 48 MB payload upload and 38.5 GB download. 127.5 MB upload and 39.7 GB upload in total. The problem of the popular torrents is that nearly all of them are seriously overseeded (times more seeders than leechers). Hardly any torrent is offering a good potential. I did 2 tests during the new year vacation on the 2000 "random"(all torrents from a certain period of time) torrents I used before. It uploaded 246.8 MB payload in 2220 seconds with 39 GB download. Which is not sweet enough, but at least better. Seems not obvious difference from popular torrents. The performance is better than the previous test. It might thank to the "directly discard non-active torrent" feature is now implemented.

synctext commented 6 years ago

Please finish your simple first 2 experiments first and create several pages of your thesis chapter "experiments and performance analysis" Simple local experiments without external wild swarms. Simple validation experiments to show the reader that such simple experiments have the easy to understand outcome. Moderate experiment: create a channel with 10-ish swarms. Still simple low and high seeders (1 or 2). Selects underseeded and then eventually all 10 swarms. Concrete final experiment: channel with 25 only underseeded swarms in a single channel and plot the upload effectiveness.

synctext commented 6 years ago

what is a good experiment to validate credit mining? current status: worked on wild-swarm experiment. Stable for 5 days, then stopped logging, error reporting, and general functioning.

Micro-economy publications, presented recently at the Delft Blockchain Launch event:

Next meeting: guaranteed upload-in-time graphs; posted to this issue.

dead swarm experiment + tracker + 2-pages of thesis text. 4th thesis experiment: create 100 dead swarms and 5 single seeder swarms. Repeat, demonstrate, andevaluate your dead-swarm removal mechanism.
Other experiment: 10-hand picked wild swarms without any download removal for 1 hour.
Final 3rd step: the simple validating non-wild experiments for thesis: go from 1 seeder to 2 seeders.
Start thesis from experimental chapter. Then work backwards. thesis storyline: freeriding and our micro-economy to fix it.

synctext commented 6 years ago

25 hand-picked wild swarms experiment completed for several hours. log file: timestamp, up/down amount. Please stop all coding! Make upload/performance graphs, write thesis.

EinNarr commented 6 years ago

Thesis: https://www.overleaf.com/11564983rgtdjxxkzxjk#/43740667/ Still under construction, far from complete.

Some figures from the experiment: CPU usage, always 0.0% cpu Memory usage, keeps increasing, probably because of increasing tasks: ram memory Upload vs payload upload, looks good on popular torrents: upload_total download vs upload, still not satisfying, around 1:1. up_and_down_payload

synctext commented 6 years ago

Great progress with the graphs!

https://en.wikipedia.org/wiki/Tragedy_of_the_commons First chapter possible storyline: freeriding prevention by creating a token system, full token economy. This thesis present a key building block. Please do not fill your thesis with CMS everywhere, just call it credit mining (will not cost 1 extra page, but really boost readability). Please polish your thesis, focus on experiments sections, and present the simple validation experiments... Leave the final experiment data open, you now have graphs. My advise is to really focus on finishing all 3 simple experiments and have 6-ish polished thesis pages in Experiments chapter.

backup the past two months!

synctext commented 6 years ago

untitled (1).pdf

We run this experiment on the test machine for roughly 53 hours.

Just show a plot where there are 0 seeders, 1 seeder, added to channel at time X, start credit mine, becomes 2 seeders after Z seconds. Plot the 5+ events in time (X-axis) and number of seeders, (Y-axis). Total experiment should only be a few seconds. No wild swarm! Do a hard-coded add_peer() if needed. Timeline picture example (with lot of colors):

We fetched 7 different torrents from a tracker on the Internet, of them 3 are newly uploaded and under-seeded torrents, others are either balanced or over-seeded.

Please create a clean and simple 2nd experiment. No wild swarms, see descriptions of 18 Jan and others. Just seed local libtorrent instance, for instance, and hard-code IPv4.

Experiment: testing the upload and active leechers in credit mining policy. Evaluation period is currently 5 minutes. Again we have 10 swarms in a channel. Each are seeded only by 1 seeder (Libtorrent trick). Experiment starts, introduces leechers which download merely 10MByte. Half of the swarms are downloaded by nobody, half of the swarms are downloaded. Show that unpopular swarms are kicked by your policy. Problem: there is a seeder+credit mining seeder; leecher should download something from credit miner.

Other: dead swarm and wild swarm experiments. Again, please ignore the total performance of credit mining in wild swarm.

synctext commented 6 years ago

Experiments:

1 seeder becomes 2 seeders
10 swarms, with either 1 or 2 seeders
10 swarms with 1 seeder, and some leechers (3rd experiment)
- Question today: limit bandwidth of leecher? Sure, no problem. Then leecher will download for 30 second or something and credit mining policy prefers thos more active swarms (1 seeder, 0 leechers versus 1 seeder 1 leecher swarms)
- At this point it does not matter what the performance is, we verify if the behavior works as expected. We then see that credit mining policy may not be intelligent enough to see the difference between swarms. Show&explain.
- All bandwidth might come from the 1 seeder. Perhaps no bandwidth from the other seeder, the credit miner. No problem, just show the result.
- Feel free to both limit the seeder upload and leecher upload and have an unconstrained credit mining seeder.

synctext commented 6 years ago

Your thesis work is nearly complete, how to get the graphs quickly...

You are very much invited to take shortcuts for the validation experiments, like setting bandwidth limits. Another one is showing a different policy. If you're policy is not suitable for them, you can simply implement a trivial policy of preferring big swarms. Reacting to total swarm size. Then in experiment 4, you intoduce your more complex policy and compare them both. We can talk more on Monday.

synctext commented 6 years ago

conducted experiment:

4 libtorrent instances; joining all the same 1 swarm. 3 seeders, 1 leecher.
Leecher starts from 0 progress, all limited to 3 MB/s down, 1MB/s up.
Using default 6881-6883 UDP/TCP port. Not configured forced UDP or TCP usage.
strange behavior, 1 seeder gets port 15000 something.
connect_peer() does not seem to function.
config option connect-to-local-peers?
problem with leecher seeing all seeders. It only downloads from 2 seeders. 1 seeder not showing up in the peer-info-list
suggested solution: ignore! just have 1 seeder and 1 credit mining peer.
add a local tracker as a possible quick fix. Simple Python only tracker
2nd option alter ports and fixate to UDP/TCP.
check your alerts, like https://www.libtorrent.org/reference-Alerts.html#peer_alert
because it seems to be a problem to experiment with local swarms, conduct validation experiment with simple wild swarm experiments.

EinNarr commented 6 years ago

join The curve and bar chart of times each swarm is joined by Credit Mining during the 30 minutes test(71 minutes in fact, since the result is too ideal, to better show the curve, I cut out the first 30 minutes) select The curve and bar chart of times each swarm is selected by Credit Mining.

These experiments are done with 100KB/s upload speed limit and 200KB/s download speed limit on the fake seeder/leecher side.

swarm_1-2: 3 seeders 3 leechers swarm_3-5: 3 seeders 1 leecher swarm_6-8: 1 seeder 3 leechers swarm_9-10:0 seeder 3 leecher

synctext commented 6 years ago

Remarks:

Critical setting: settings_pack::allow_multiple_connections_per_ip
Above graph is a more advanced experiment-3 with also dead swarms.
- swarm evaluation period is set at 1 minute
- Each swarm is 800MByte, so no completed downloads by leechers
- Three swarms joined 30 times, thus 90 joins, other 13 or 14 joins, thus 7 x 13 =91; 181-ish times
- checking consistency: 30 minutes = just 30 join decisions per experiment; versus 181 seen in plot.
- upper graph: select 3 at random and 3 by performance every minute.
- lower graph: just 3 swarms by performance; not intuitive results! Why select swarm 6 and not 7?
- Idea: please try selecting 2 swarms and 4 swarms. What happens then?
- @EinNarr can not predict the results of such above experiments
- second plot: lines overlap, but it could be that 4 swarms are selected at time 0.
repeating from above: As simple as possible: create 1 random swarm, seed, create channel with that single swarm, start credit mining, now 2 seeders, happy!.
- create simple timeline graph of this Experiment-1
- each of this 5 activities takes certain amount of millisecond. Log, measure, plot, done!
please push to Github your latest policy, experiment scripts, and graph plotting code
plan: both create experiment plots and experiment.tex thesis writing
the wild experiment of 1200 minutes in thesis.pdf deletes all ongoing downloads every 3-5 minutes and starts again. This simple policy is used to prevent the disk from quickly filling up. This also hampers upload performance.
- wild experiment setting: 3 best-performance slots, 3 random slots; these 3 random slots may encounter regular deletes
- please plot the switching swarms, keeping swarms and delete events
- always delete policy, possible alternative to explore: delete largest
- possible policy: random
- delete lowest yield swarm

EinNarr commented 6 years ago

Thesis_65%.pdf This is the half finished thesis with most of the problem description, system design explanation. Half of the experiment. and leak of introduction.

synctext commented 6 years ago

Revisit Kodi?
- Nexus 5 with Kodi screenshots
- Final grand Thesis section: "performance analysis on Android/Kodi/Embedded-hardware"
Scientific performance experiments we discussed
- dead swarm experiment
- policy efficiency experiment
- [copied from above] the wild experiment of 1200 minutes in thesis.pdf deletes all ongoing downloads every 3-5 minutes and starts again. This simple policy is used to prevent the disk from quickly filling up. This also hampers upload performance.
- wild experiment setting: 3 best-performance slots, 3 random slots; these 3 random slots may encounter regular deletes
- repeatability experiment: same 6 slot setting, run for 1hour for 24 times, compare results and efficiency differences. Same swarm set.
- parameter sensitivity: what settings work best? Total slots, random slots, etc. Run for 1 hour each time, and compare

synctext commented 6 years ago

untitled (5).pdf

Next step, please focus fully on experiment description of your final thesis chapter.

Simple experiments with 1 leecher, for instance, which explain your mechanism.

EinNarr commented 6 years ago

Thesis update. Thesis_0614.pdf

synctext commented 6 years ago

For an interesting intro perspective what if Bittorrent had a token? We talked a lot about the concept of simplicity. Both experiments and algorithms need to be simple, otherwise it's impossible to get a Youtube-level service operational for 2 billion people. I'm not aware of any theoretical grounding for the need for simplicity. However, complexity often does not scale, it breaks. Thus the best-practice in the field of self-organising systems is: simplicity. Simple experiments in a thesis can be easily understood and should scale. Meaning, it should even work with the insane usage-level of Youtube-like service: 2 billion monthly active users. </rant>

Figure 9: Differences in propagation speed in an overlay consisting of 1000 nodes.
Demers 1 and 11 show the theoretical performance of a random overlay based on
epidemic theory, Emulation 1 and 11 show the actual performance of our overlay.

Dispersy bundle synchronisation for the spreading speed of broadcasts by an epidemic protocol (Figure 9).

Please add the trivial validation experiment we talked about. It has such ridiculous simplicity it's difficult to make a graph out of it. Validation experiment. Shows (a) create 1 channel with two swarms; with 1 seeder each and (b) credit mining is started and a random swam is joined, resulting in 1 swarm with 2 seeders img_20180614_105642

The second experiment: moderate experiment: create 1 channel with 10-ish swarms. Each swarm has either 1 seeder or 2 seeders. Thus still a most-trivial-possible experiment with simple low and high seeders (1 or 2). Five swarms with one seeder and five swarms with two seeders. The policy should invest the credit mining disk space into underseeded swarms. img_20180614_111631

Discussing of your policy Likewise, we select a certain of amount of swarms to investigate according their prior performance and several other swarms randomly to constantly keep the vitality of Credit Mining.

Just be more direct and to the point: our vitality policy is based on very simple principle of a feedback loop driven by income earning ability of a swarm (e.g. upload performance). The vitality policy: randomly selects a number of swarms (parameter X), monitor the income from this swarm for a limited amount of time, periodically evaluate and compare the income earned from each swarm, remove the lowest earning swarms (if not credit mining maximum amount of swarms yet, parameter Y), and repeats this cycle endlessly.

egbertbouman commented 6 years ago

@EinNarr Just talked to Johan about your experiments, and you're very much invited to compare multiple policies in your thesis. The torrents in your experiment can just be synthetic. No need to use wild swarms. This avoids your problems with many overseeded swarms.

EinNarr commented 6 years ago

thesis.pdf

synctext commented 6 years ago

Explore different parameter settings & experiments:

invest for 2 min or 60 min in a swarm?
investigate 25 or 250 swarms at once?
Smart Download Mode:
- Restrict to 25 MByte or 10-ish pieces
- Switch from download mode to upload mode in Libtorrent
- 250 swarms of 25 MByte take
Fake download location that Arvid made (for us? :-)
Bittorrent Simulation Framework
- you have no data on-disk
- you can still seed
- send arbitrary data
- Download arbitrary data, turn off hash-check
- All original Libttorrent engine, with some patching.
compare two policies with minor differences or which are very different
document in thesis, HAPPY; DONE!

devos50 commented 6 years ago

One of the Gumby scenario files: https://github.com/ardhipoetra/gumby/blob/0f860dcad94a59324b1a80d946977b4344731fb3/experiments/tribler/channel_download.scenario

You can check out this PR: https://github.com/Tribler/gumby/pull/282

EinNarr commented 6 years ago

Roadmap to graduation still remains unclear. Starting from refining the test frame work which had already been used in the 10 torrents 1 channel experiment.

EinNarr commented 6 years ago

Experiments done outside Tribler till now.

Experiment on 100 most popular torrents By "investigated", it means the torrents who finished the initial 25mb investment.
Experiment on 30 most recent torrents.
Experiment involving "fair-comparison" (compare (upload-download)/time instead of pure upload-download)
Keep introducing new torrents to the session, set "promotion" to 3 levels (25m, 250m, 1G) instead of only one in previous policies.

EinNarr commented 6 years ago

Some thoughts about the 4th policy. This is the share ratio vs. number of torrents graph figure_10-7 figure_10-8 The X-axis is upload/download ratio, Y-axis is number of torrents. The curve is the number of torrents have same or better ratio than its X-coordinate. Bar chart is the number of torrent have upload/download ratio no worse than its X-coordinate, but worse than the next ratio on the X-axis. The blue bar is for all the torrent, and the orange is for the "promoted(limit to 250MB) torrents. There should have been a three color on bar chart for “super promoted(limit to 1GB)", but as far as I observed, there is no torrent with more than 120MB download.

It could be that no torrent is ever super-promoted, or only small torrents less than 120MB are super-promoted. Details will be determined after the newest data is revealed tomorrow.

synctext commented 6 years ago

Very impressive thesis material!

synctext commented 6 years ago

Please focus on your thesis writing and storyline now. Stop working on graphs and data processing. Proposed storyline.

We present in this chapter the experiments we conducted around credit mining investments with the following scientific topics:

1. Discovery of opportunities
1. Do a single investment
1. Delete bad investments
1. Increase high yield investments
1. Retire exhausted investments

EinNarr commented 6 years ago

Another latest version of PDF and an interesting percentage change of total download of each category. credit-mining.pdf figure_17

synctext commented 6 years ago

The Big Picture is not understood by thesis committee
- universal cashing (boosting/acceleration) mechanism
- generic system: just give it 1000 swarms, it will invest in the best ones
- does not need any info from swarms, just 20Byte ID, takes magically care of the rest
- NOT JUST A SIMULATION (this was deeply misunderstood)
Policy fundamentals:
- discovery of opportunity
- initial investment
- increase investment
- dead investment
- exhausted investment (not covered in thesis, but we know it exists)
Architecture picture
Related work, like: http://members.unine.ch/pascal.felber/publications/PAM-04.pdf
My contributions
Swarm Definition
Reproducible science
- more info
- date of experiment
- Repeat experiment, obviously different outcome
What is the desired outcome of these experiments?
document engineering effort
- Lines of Code per class/source file
- three implementations, 2 covered in thesis: 1st prototype, 2nd prototype.
add picture of the conducted greedy experiment + discuss.

synctext commented 6 years ago

Plus this remark: "the link to his code added to the thesis (in the spirit of reproducibility)."

EinNarr commented 6 years ago

cmthesis.pdf

synctext commented 6 years ago

Thesis is still not clear

Thesis is not in good shape yet sadly. Detailed remarks:

mentions engineering details everywhere, instead of presenting the overview clearly, like time of experiment: 8:31pm on 17 July.
Explain building blocks better of policies
Section "Definition declaration", rename to "terms and definitions"
Pic? https://goo.gl/images/vpFH71
Fig 1.2 : explain all pieces or remove picture
Fig 1.3 move to Section 1.4 Token Economy ?
goal of token economy in Tribler is to create a self-sustaining Youtube-like ecosystem which is completely server-free. Or more generic, improve upon tot-for-tat collaborative mechanisms.
"With 5199 lines of code inserted and 1961 lines deleted(including duplicates), we design and implement a system called Credit Mining, relying on existing features of Tribler", more like, 'we designed, implemented, deployed, and evaluated a real-world credit mining system'.
3.2.1 detect dead swarm
"Intel Core i7-7820HK" do not mention such details of first page, explain the general character of your experiment.
"3.3 Graphical user interface", KODI, the work in this thesis aims to be used in a television-like context. We ported our software to the market-leading low-cost embedded TV platform called Kodi.
first sentence of the final experimental chapter contain a strange intro line, stating negative results: "the policy we implements does not provide a positive enough yield."

EinNarr commented 6 years ago

untitled (6).pdf

synctext commented 6 years ago

big picture is unclear, it's not about adding more details.
Please replace all text on page 23, Chapter 4 intro with better text.
first lines of experimental section need to explain what you did and why.
"In the previous chapter we explained how we aim to solve the freeriding problem with a token in Bittorrent-like systems. In this chapter we deploy our algorithm, using real-world Bittorrent swarms to validate our assumption and measure performance. We first test our proposed algorithm in a controlled environment to validate our implemented mechanism."
Re-write first lines of each chapter, after introduction.
'Ever since the popularization of P2P file-sharing systems, people have been fighting an endless war against free-riders'
Avoid jargon in each first sentences. Like Chapter 2: Collaborative systems like P2P file sharing critically depend on the voluntary donations of bandwidth by their users. However, many people do not contribute or upload in such system, the freeriders. This problem is central in this thesis. For years scientists have tried to express cooperative behavior in such communities with a token, coin or karma points. Uploading to other peers will be rewarded with bandwidth tokens and downloading will cost these tokens. We focus on the problem of automatically earning credits by contributing bandwidth to a community. [new section with more details].

przemyslaw-pawelczak commented 6 years ago

Summary of suggested changes after a discussion between Przemyslaw Pawelczak and Bohao Zhang on 28 September 2018, 10:30 TU Delft

Content corrections:

Please add into "Related work" (Section 2.3) information about other fields (Banking? Car leasing? Army?) with a similar problem of free-riding and how it is solved in these fields
I know that Johan suggested to add an extra paragraph in Section 2.3. on the discussion on policy fundamentals: discovery of opportunity, initial investment, increase investment, dead investment, exhausted investment. A short paragraph on each of these points would be great
Please expand architecture picture in Figure 3.1 as it is too simplistic as of now. This figure should provide information about (a) software components [which function does what in all the blocks you list in Figure 3.1], (b) where each of these blocks run [on a specific client?, on many clients? On a swarm? How many of them are they? etc.], (c) what information exactly is exchanged between these blocks [TCP packets/UDP packets?, packets containing what?, on what sockets?, etc.] (d) how each of the boxes is split into sub-components (e.g. TrustChain is a huge component and needs to be explained in a more detail). The more details - the better!
In Section 4.1 provide information about the underlying peer-to-peer network (how many clients, how geographically distributed, how many were seeders, how many free riders originally, network connectivity, spatial distribution of a network [how often clients were in-active/offline], etc.) where you tested your CreditMining system

Text structure corrections:

Divide "Future Work" (Section 6.2) in subsections such that one subsection is one topic of future work
Make sub-sections in Experimental parts of the thesis: experiment goal, experiment setup, experiment results
Move implementation information you provide in Page 10 to a new subsection which should be placed after Subsection 3.3

Editorial corrections:

Replace Figure 5.9 with bar plot and make it smaller
Move all figures to the top of the page
Make Figures smaller if they occupy a lot of space per page (e.g. Figure 3.4)
Make capitalisation of all titles/subtitles/subsubtitles/... uniform
Add full stops at the end of the sentence in each figure caption
Correct Page 8 by removing lots of white space between figure and text

synctext commented 5 years ago

Related work on Delft University credit mining:

Ethereum Swarm project "Nodes provide signed receipts for stored chunks which they are allowed to charge arbitrary amounts for." "If a promise is not kept and a chunk is not found in the swarm anyone can report the loss by putting up a challenge. The response to a challenge is a refutation. Validity of the challenge as well as its refutation need to be easily verifyable by the contract."
Filecoin incentives, see #3097
Sia, Storj, Maidsafe, IPFS, etc.

xoriole commented 5 years ago

A live screenshot of token mining running multi-level policy credit_mining

qstokkink commented 5 years ago

After years of effort, we finally have an operational credit mining strategy. Now it actually generates tokens as well 😃

Tribler / tribler