Tribler / tribler

Privacy enhanced BitTorrent client with P2P content discovery
https://www.tribler.org
GNU General Public License v3.0
4.8k stars 444 forks source link

A State-based CRDT without Infinite Growth #2571

Closed synctext closed 3 years ago

synctext commented 7 years ago

Goal: upload and download byte accounting, detect freeriders, and refuse service.

Requirement: privacy is a cradinal requirement, only count bytes with a certain accuracy and never leak what is being downloaded or relayed. Side channel attack danger. 20161013_100634

synctext commented 7 years ago

Thesis (2).pdf

Feedback: expand 2-page intro with indirect reciprocity, tragedy of commons, leftist-stuff is OK in foreword, rest needs to be boring & scientifically grounded.

Chapter 1, ThesisReady: tragedy, iterative PD, freeriding, single-point-of-failure, tribler attempts at finding mechanisms to build trust

synctext commented 7 years ago

Thesis (3).pdf Feedback:

Storyline: tragedy, iterative PD, freeriding, single-point-of-failure, tribler attempts at finding mechanisms to build trust

synctext commented 7 years ago

Result of quick update session. Stop deleting bad text for one cycle. Create brainfart version of:

synctext commented 7 years ago

new .pdf @Captain-Coder ?

Captain-Coder commented 7 years ago

Please see here for the most recent version.

synctext commented 7 years ago

Comments on recent thesis version. thesis (46).pdf

"There are several elephants in the room when translating this high-level concept to an actual implementation."

Brainstorm ToDo list:

Captain-Coder commented 7 years ago

The most interresting experiment we could do, is to verify that the multichain can actually be used to reject free riders. The experiment I want to should thus show that with multichain disabled free riding is possible, especially over multiple download sessions. When multichain is enabled the experiment should show that those that could still free-ride before will now be rejected.

For example: Start 100 nodes in 4 groups: 40 nodes will download and seed upto ratio 1.9, 20 free ride low (ratio 0.8), 20 free ride medium (ratio 0.5) and 20 free ride heavy (ratio 0.0). Each node has a 200KB/s maximum upload throughput and an unlimited donwload throughput. Sequentially perform 15 downloads of 20MB. 4 nodes will be picked randomly (uniform distribution) to serve as seeds for each download. This should mimic periodic content releases over time. Plot download speed over time for each node, colour by node grouping. If multichain works as advertized, we should see the download throughput of the 3 free rider groups converging towards their respective sharing ratios or lower. Importantly we should see some of the 15 downloads no longer completing. At the same time the other nodes should see a modest increase in throughput. If multichain is disabled (or does not work as advertized) we should see all downloads completing, and all nodes getting roughly equal throughput.

This would need tribler to meddle with the workings of libtorrent, which might require some engineering effort. Alternatively it should be possible to run this experiment through hidden tunnels, which are easy to control based on multichain scoring.

synctext commented 7 years ago

First, small scale validation experiment. Easy to understand and check manually. Community-level emulation, 100 nodes. Good Ewout idea: string of 15 downloads.

Hard threshold policy for rejecting relay requests. Please focus on evaluating and testing a hard-coded 3 GByte freeriding limit. 20170109_174027 1 Sadly it is likely that only positive reinforcement policies will scale and resilience against simple whitewashing attacks.

synctext commented 7 years ago

"The following deliberately simple experiment demonstrates our access control decisions. Requests by freeriders are rejected."

synctext commented 7 years ago

Key objective minimal-validation-experiment: 1 downloader from 1 seeder. Repeated downloads, then gets rejected by the tunnel community proxy. (e.g. 3 GByte freeriding limit, 250MB file downloads, blocked after 12 times) Only tunnel stuff has the reject ability, minimal requirement is 2 proxies.

synctext commented 7 years ago

Requirement: crawl neighbors (with Quinten). Key objective 2 full-validation-experiment: Community-level emulation, 100-ish nodes. Key objective 3 minimal-validation-experiment: parity of contributions: seeder, proxies, downloader

Scientific question: placement of reject policy (during rendezvous proxy setup?)

Very future work (do not work on this in 2017 please!): priority-based queuing mechanism, thus give away bandwidth to highest ratio/upload peer and pre-empt low-level contributors.

synctext commented 7 years ago

Gumby experiments required re-work. Tribler tries to re-use tunnels after first-time usage. But that goes wrong. Second download sees no fresh tunnels are needed, but libtorrent or something can't really use them. Ongoing investigation.

Plus Gumby uses the mainline public DHT network for all tunnel activities. Ongoing tribler fix and Gumby PR.

How close to operational code? Few weeks..

The current true halves tries to somewhat do a light check on previous hash. When getting a signature request Tribler now always requests a few blocks and fires up a deferred in 5 seconds. Only if the integrity of Trustchain is correct we put our signature down. Also we moved beyond the sign-anything policy, byte accounting is checked (rounded to 16KB blocks or so+fuzzy factor).

devos50 commented 7 years ago

@Captain-Coder where is this fuzzy factor exactly? I haven't encountered it so far in the code base or did I miss something?

Captain-Coder commented 7 years ago

@devos50 Yeah seems you're right. I'm quite sure PimV and me agreed to put it in. But apperently we never did.

It definately should be in there though, i'm already seeing tons of "not enough outstanding credit" errors in my experiment, even though there is no malicious node and everything works via localhost. I suspect the counters in (hidden_)tunnel_community.py are nog counting the same thing going out as comming in. I'll probably need to fix this for my experiments.

synctext commented 7 years ago

ToDo: 3 pages of master experiment thesis material.

synctext commented 7 years ago

Update: DHT isolation PR needs rebase.

Excellent bug hunting, anonymous DHT in Tribler results in identity spamming. Our small exit node community gets abuse blocked by most DHT implementations. We mix introduction IP:ports with exit node IP:ports (exits get DHT blocked, introduction points run their own DHT instance).

synctext commented 7 years ago

Update: Tribler patch was needed for current Arch Linux support of Tribler, Libtorrent 1.1.1 series and Socks API.

We moved to Tx block inside transaction of Trustchain. This binary blob is put straight into the Tribler SQL table Triblerchain_blocks. Aggregation queries are now impossible from SQL (e.g. requires character extraction). Somewhat working code of reject policy based on MBytes. We have true halves, plus implemented policy to ignore all MByte on Trustchain if it is not counter-signed. Ongoing work:scoring function in MBytes. Semi working Gumby with 2 seeders, relays, and 4-ish downloaders. First reject policy results in 1-2 weeks. 4 hours of runtime.

Real experimental results: full devel hidden seeding code with DHT isolation (== remove bootstrap nodes). Experiment duration is determined by file size above trustchain counting limit and below 10min tunnel lifetime x amount of swarm downloads. O yeah, working Raspberry pi box, 50 KBytes/sec anon downloads.

synctext commented 6 years ago

Thesis material also include global re-factoring and end-to-end deployment testing+fixes.

There are both outgoing and incoming tunnel_extend requests. Both trigger the calculation of scoring function. Major refactoring task of the Tx block and SQL usage. Python type inheritance of Trustchain to both Triblerchain and Tradechain. Our approach is merely code sharing not data sharing or data duplication. Tx block is parsed and optionally split into fields. Dictionary of Tradechain can't be split into fixed fields, thus remains a blob. Not fancy, no realistic alternatives, but required.

Raspberry pi, build script of Libtorrent and Tribler core (no GUI). ToDo: cut-off experiment, 1-ish week.

synctext commented 6 years ago

Ongoing DHT isolation PR since June 8. Now writing thesis text, experiment chapter.

Warning: at End of December the phd students of Tribler team will work full-time on fixing the credit mining+payout through tunnels. Key feature for Tribler project, earn credits by donating disk and bandwidth...

synctext commented 6 years ago

Add to thesis our Tribler publication "Paying the Guard: an Entry-Guard-based Payment System for Tor". Quote from draft thesis:

A naïve scheme could be to compensate relays for their bandwidth usage, both up and down where the
downloader at the end of the proxy chain has to provide this compensation. Such a scheme is doomed to fail for the following reasons.
[...]
Everyone has to wait for the payment request to eventually
reach the downloader and for the payment to propagate back along the circuit. 

Your attacker model assumes people deviate from the protocol. Human behavioral studies (including ours) has showed this to be incorrect, the overall majority is honest or follows the default. Let's call this the opportunistic token economy...

devos50 commented 6 years ago

The first step towards proxy/exit node payouts is done: a very basic experiment to perform hidden seeding and downloading (one hop, one circuit, one downloader and one seeder). See https://jenkins.tribler.org/job/pers/job/hidden_seeding_devos50/.

329854c102d7f6a3bd836152446f95f0a1d17faf_total_down

329854c102d7f6a3bd836152446f95f0a1d17faf_total_up

(annotations in the graphs above are bugged for some reason)

Next step: integration of TriblerChain + implementation of payouts.

UPDATE: TriblerChain is running correctly, next to the hidden tunnel community. Left: implementation of payouts.

synctext commented 6 years ago

solid milestone. Perhaps we want to expand this Jenkins job to annotate the various .PNG file with start,stop, etc time markers. Like unit tests and Allchannels. Work towards a few article+thesis inclusion graphs.

devos50 commented 6 years ago

@synctext the graphs above have annotations (the green/blue/red vertical lines) but the text is not visible for some reason. I suspect this has something to do with the way the graphs get rendered on our bbq server. I noticed that annotation texts are quite unreliable in R and they display if I render them on my own machine. While not top priority, we should look into this at some point.

devos50 commented 6 years ago

It seems that building a hidden seeding circuit is sometimes failing (with probability around 30% I think). I will try to debug this.

devos50 commented 6 years ago

To make it a little easier for myself to debug this kind of code, I've created a small script to visualise which tunnels are being built. The solid line indicates an 'ordinary' data circuit, the dotted line is an encrypted e2e tunnel between a downloader and seeder, and the dotdash line is an introduction point tunnel. The numbers on the lines indicate the amount of bytes transferred over this link.

circuits

In the figure above, two nodes acts as introduction point for the hidden seeder (peer 3 and 8). It's generated automatically on Jenkins after running https://jenkins.tribler.org/job/pers/job/hidden_seeding_devos50/63/ 👍 . It's fascinating to see how many circuits are built for even a single hidden seed/download.

devos50 commented 6 years ago

Related work for anonymous payouts:

devos50 commented 6 years ago

I've been thinking about a very basic design for the payouts, taking the related work and the 'paying the guard' paper in mind. I argue that an important requirement is to keep the unlinkability between downloader and seeder. To keep focus, payouts could be implemented only on e2e-encrypted circuits between downloaders and seeders for now.

Imagine a e2e tunnel between a downloader (D) and seeder (S), using two intermediate relays (one for the downloader, R1, and one for the seeder, R2). Every minute, the seeder initiates a so-called 'payment request' by sending a payment-request message over the e2e tunnel to the downloader. This message contains the amount of hops on the side of the seeder (in our example, this is 1) and the total amount of seeded bytes. Note that this payment request is also initiated when destroying a circuit, possibly due to inactivity or age. When the downloader receives this payment request, he will issue IOU certificates to the relays. Assume that a total of 100MB is seeded to the downloader. The downloader will issue an IOU to the first relay for a total amount of 300MB. This relay will in turn issue an IOU to the next relay, R2, for 200MB. Finally, R2 will issue an IOU to the seeder for 100MB. For the first iteration, seeders get payed for an equal amount as relays. Note that we assume that relays, seeders and downloaders in the scheme behave honest and accurately issue subsequent IOUs to the next relay/seeder. We might be able to detect fraudulent behaviour at one point, by forging something like a proof-of-bandwidth. I'm not sure how to do such a thing but this might be helpful to build more trust in the network.

When these IOUs are settled between users, everyone is able to track circuits and bytes sent and as a consequence, will be able to link downloader and seeder. The key solution is given in the Paying the Guard paper: aggregated and deferred payments, settled at predefined time intervals. When A owns 10MB bandwidth credits to B and B owns 20MB to C, A can sign a transaction for 10MB to C and B does the same to C.

I will work on a basic implementation of this scheme tomorrow.

Captain-Coder commented 6 years ago

@devos50 , I imagine @synctext put you on this track to accelerate/stimulate my progress? Or because its becomming a blocker? Anyway my thesis drafts (even those from long ago) contain some analysis of the systems you are proposing.

Imo, any direct payout scheme based on multiplying bytes by anything other than 1 will expose the seeder and downloader. And they are fragile. Please don't implement this. Even if @synctext wants it :P.

Because of live edges (based on any sort of trust scoring mechanism), a clique will devlop around the exits and clients will be forced to use longer tunnels than one hop to get service. So any scheme that relies on 1 hop fixed tunnels wil fail. Badly.

I agree that aggregated + deferred payments is the way to go for now. I was also investigating an option of obfusticated transactions on a chain. Where the block is verifiable but not directly tracable to the counterparty. Another option is to only work with payments of 1GB and thus first buy credit from someone, and thus keep the true account "off the books". This scheme has added bennefits in the case of TriblerChain. But can we please discuss all this on monday in the new building?

Also I already have gumby experiments, I might not have checked those in though -.- . We should probably discuss to avoid doing double work.

Cool graphics btw, is that live or generated after the experiment? Based on the triblerchain or some custom output from the hiddenservices community?

devos50 commented 6 years ago

@Captain-Coder I've been assigned to the implementation of hidden payouts by @synctext but not to accelerate or stimulate your thesis (although this might be a nice side effect :p). We also mentioned that we will start working on credit mining/payouts in a earlier comment. Seeder/relay payouts is a critical feature we need for an upcoming article about operational coin/credit mining and trading. I know your thesis material contains some comments on this matter and I've read it already.

I agree that direct payouts is not the way to go, without fancy cryptographic techniques or other obfuscations. For now, we want a basic implementation and something that's not too complicated, even if it's not the most optimal solution for now. After reading some related work (see my previous comments), I realised that incentivising relays is far from trivial since it directly influences anonymity. I think TorCoin (together with TorPath) is an elegant approach, although they still rely on a centralised bank and directory servers which is a tradeoff we don't want to make.

Your proposal about obfuscated transactions sounds similar to confidential transactions in Bitcoins where they use homomorphic commitment schemes to hide the exact amount that is being transacted between users (see https://elementsproject.org/elements/confidential-transactions/investigation.html). Although a nice technique, their zero-knowledge range proof is computationally expensive (although there's already a proposal to speed this up).

It took me quite some time to get the hidden seeding experiment up and running. I've checked prior to creating the experiment whether there wasn't already something available. I had to fix quite some stuff actually before it worked, but now we have a base experiment for hidden seeding which we can expand on and improve incrementally.

The graph is created from data in the TunnelCommunity class, especially from the circuits and relay_from_to attributes. I've written some code to 'reconstruct' a tunnel and I'm plotting this with an R script (using the igraph package). See https://github.com/devos50/gumby/blob/payouts/experiments/tunnels/parse_tunnel_statistics.py#L98 for this code.

For now I will continue with the implementation of payouts and we can have a discussion on Monday. If you have any ideas how to improve my design or if you have your own additional ideas, please let me know!

synctext commented 6 years ago

We need to define the basics or will run into trouble during publication. Creating a creating a token economy and onion relay system is hard. What is the attack model?

Every minute, the seeder initiates a so-called 'payment request'

MvP... What is the mechanism design motivation behind this? Please just start accounting for 10-minute tunnel byte transfers upon closure, should be sufficient. No need to go beyond the Pim-Method before we have all other pieces complete.

For now, we want a basic implementation and something that's not too complicated, even if it's not the most optimal solution for now.

Very much in agreement. Keep it simple, deploy something within a month! The Tribler lab will become irrelevant if SIA/IFPS/FileCoin/MadiSafe/STORJ start to dominate the field. A deployed mechanism with only a weak attack model but operational micro-economy and decentral market is progress. Then iterate until OK.

any direct payout scheme based on multiplying bytes by anything other than 1 will expose the seeder and downloader.

Obviously true. This is known for 15 years already, please read On the Economics of Anonymity. Once we have an operational deployed system we can, for instance, make all payouts multiples of 250MByte. Periodic settlement. Lot of techniques can be applied.

When the downloader receives this payment request, he will issue IOU certificates to the relays.

Is this IOU something new or just a TrustChain record we sign and hide for a while? For instance, weekly reveal of all account balances.

devos50 commented 6 years ago

We need to define the basics or will run into trouble during publication. Creating a creating a token economy and onion relay system is hard. What is the attack model?

Good point, we have to define that as soon as we can.

MvP... What is the mechanism design motivation behind this?

I've already discarded the idea of initiating payment requests periodically. I realized it's a lot easier to do the payment when closing a tunnel so that's what I've implemented right now.

Is this IOU something new or just a TrustChain record we sign and hide for a while?

Essentially just a dual-signed Trustchain record that hasn't been made public.

Thanks for the paper, seems to be very related to what we are going to do and I will read it as soon as possible 👍

synctext commented 6 years ago

Just writing down my thoughts.. What is our long-term strategy for Sybil attack prevention? It started with TrustChain design in 2012, public discussion in 2013 at #31.

Just to explore and define our design space further... Create a trusted anonimity set to craft trustworthy mixnet? Trusted anon set using our ancient 2009 deployed technology of a gossip-based decentral social network? Is it really secure to ask our users to scan passports on our identity App for trustworthy self-sovereign non-Sybil rule? Or get help from completely anonymous strangers with multi-year reputations, using #2805 incremental PageRank? Put our faith in fancy crypto party tricks (e.g. beyond onions, going homomorphic) ?

Missing anything? @devos50 @qstokkink @xoriole

Essentially just a dual-signed Trustchain record that hasn't been made public.

Glad to hear such simplicity.

devos50 commented 6 years ago

First basic implementation of payouts is done (code: https://github.com/devos50/tribler/commit/abe777bb934e79c7e088e859d49712fb1cbceb36)! 👍 1 downloader (peer 5), 1 seeder (peer 4) and an experiment where at most 10MB will be sent over a tunnel before it is destroyed and a new one is established. The seeder and downloader exchange a 15MB file over an e2e-encrypted circuit. So at least one circuit has to be broken. When the circuit is destroyed, the downloader pays the relays and the seeder.

The Trustchain balances are as follows after the experiment (https://jenkins.tribler.org/job/pers/job/hidden_seeding_devos50/94/artifact/output/trustchain_balances.csv):

peer,total_up,total_down,balance
10,30597204,20398136,10199068
4,10199068,0,10199068
5,20398136,40796272,-20398136

The circuit looks something like this: 4 -> 5 -> 10 -> 5. It seems that peer 5 also acts as a relay, besides being downloader, in the e2e-circuit which can happen. Also, seeders and relays are paid equally for now. Note that the current implementation is not secure at all and leaks privacy which is undesirable.

Next steps: scaling up the experiment with more nodes and relays? Investigating how to make the payouts more anonymous?

@synctext I think the most effective prevention of the Sybil attack is by utilizing long-lived identities but this directly opposes anonymity. Also, defense against the Sybil attack is a broad concept since currently, there might be multiple attack vectors in various layers of our technology stack (Dispersy, Trustchain, tunnels). Let's discuss these issues somewhere next week?

devos50 commented 6 years ago

New implementation of payouts, based on IPv8 tunnels: https://github.com/Tribler/tribler/pull/3492

synctext commented 6 years ago

Thesis-experiment-chapter.pdf Quick Feedback:

synctext commented 6 years ago

Mental notes:

synctext commented 6 years ago

Thesis progress warning: due to the slow pace it's possible this work will become irrelevant.

devos50 commented 5 years ago

Some first initial results. I've been logging all rejection events as a tuple of (time, balance). Over a period of four days, I've gathered and analyzed 298.035 rejection events.

The following ECDF shows the cumulative distribution of token balance in GB on the horizontal axis, and the ECDF value on the vertical one:

circuit_rejects_basic

As we see, more than 50% of the rejections are with a balance lower than 100GB, which is pretty decent I guess. Barely any users with a positive balance are rejected, which is also a good result.

I also plotted the hourly load (number of rejection events in a specific hour):

hourly_load_1

We observe that the load is somewhat uniformly spread. This suggests that there is barely any variance in the system load. We should increase our infrastructure capacity to get insights into the system under different amounts of loads. My hypothesis is that during a higher load, more people with lower token balances will be rejected. However, I'm not able to verify this behaviour due to the lack of capacity :(

Captain-Coder commented 5 years ago

A recent version of my thesis. thesis.pdf

devos50 commented 5 years ago

I attempted to make a new graph and show how free-riders are rejected from the exit nodes. This experiment is based on 7723 rejections on a single machine over the period of 5 days. With our increased capacity, we are currently not in an overloaded state anymore so I would expect a different result than my previous plot. It seems that there are more users with very low balances are rejected when the load is high, but this does not hold for users with balances just below zero.

reject_load

And the hourly load again:

hours_distribution

I highly doubt the correctness of the second plot. To be continued...

synctext commented 5 years ago

General direction: ?vulnerabilities in bandwidth trading?

Token economics and privacy enhancement:

synctext commented 5 years ago

Solid progress with Gumby experiment scripts. As discussed:

Captain-Coder commented 5 years ago

Fraud experiment results freerider-experiment-trustchain-score

Red is the exit, the blue and purple peers cheat. Up until the moment that they cheat everything is fair. And this is the impact of their cheating on their downloads

freerider-experiment-download-progress

devos50 commented 3 years ago

General agreements:

Tentative chapter deadlines:

First week: literature reading, deep-dive into our lab research + other related literature. Also see here and here. Deliverable: 1 page findings of your literature survey.

Captain-Coder commented 3 years ago

Having a little trouble letting go of previous work, that's the first hurdle to overcome now. Read papers from our lab from the last period that i missed. Have some ideas as to which direction to go.

2020 Week 41.txt

devos50 commented 3 years ago

Meeting notes 12-10:

Expectations for next week:

Next steps (after next week): Select one of the identified directions, start looking for external material and start writing a state-of-the-art chapter.

Captain-Coder commented 3 years ago

2020 Week 42.txt

devos50 commented 3 years ago

Meeting notes 19-10:

Next steps:

Captain-Coder commented 3 years ago

Spent a lot of time to study and properly understand the crdt definition, and the mathematical basis for them. Turns out the definition is not all that exiting. The applications are nice though.

Discoverd that I should also discuss Distributed systems, synchronization, the cap theorem and Operational Transformation as backstory of crdts. Not typed all of that up yet, but got a few key points down.

Got the latex template (this is still the current one of the department, its from 2005?), and spent some time learning how to use bibtex.

thesis.pdf

devos50 commented 3 years ago

Meeting notes 26-10:

For next week, we should decide if this is the definitive thesis direction.