Tribler / tribler

Privacy enhanced BitTorrent client with P2P content discovery
https://www.tribler.org
GNU General Public License v3.0
4.85k stars 450 forks source link

Trusted peer discovery and improved NAT puncturing #2623

Closed synctext closed 6 years ago

synctext commented 7 years ago

Currently all Dispersy communities have their own isolated walker. This is not very efficient. This issue aims to build upon the ongoing multichain work and create a trusted peer discovery mechnism with high-performance NAT puncturing.

Background reading (general):

Trusted peer discovery:

Technical docs:

YourDaddyIsHere commented 7 years ago

@synctext simulated network in local machine without port limit

It is almost done, I give the active walker with fake network endpoint, hence in the active walker's view, it is connecting to a real network(In other words, I deceiving the walker, make it believe it is walking in real network). So, I don't need to change the logic of the walker.

For Simulated network, we generate only node id list, and store the node id, private key and fake ip and port in database.

Every time the real walker (i.e. active walker) wants to take a step, it sends the message to a fake address and the simulated network will translate the address to node id and then generate the links and multichain-blocks on this node using determinstic random seed generated beforehand. (in other words, links are generated "on the fly", there is no node instance or link instance stored in memory, I store the node-id/(ip,port) lookup table in database to save memory, but I can also move that into memory)

I am doing the experiment and will uploads some figures later.

Since today is 3 July and the next meeting is in 18 July (still 15 days to go...), can we have a 5-10 minutes drop-in meeting before that? (If you like, we can meet after 5:30 PM of any days)

YourDaddyIsHere commented 7 years ago

Number of Honest/evil nodes the walker meet: figure_1

Just as expected: 40% of the nodes in network are honest and 60% are evil, hence in long run, the nodes meet by the walker consist of 40% honest nodes and 60% evil nodes

YourDaddyIsHere commented 7 years ago

@synctext 1.The database is removed, the node id and address lookup table is in memory now. 2.Now it can specify how many attack edge I want.

synctext commented 7 years ago

solid progress! Cool experiment: how much contact with evil nodes for certain walk parameters and attack strength.

Xaxis amount of attack edges. from just a few to twice more attack edges than honest nodes...

Yaxis percentage good vs. evil nodes discovered 0% evil to 100% evil

several line colors for random walk: black 40% reset back to home (alpha), blue 30%, green 20, red 10%. So red gets random walks across trust edges of length 10, far in evil sybil region.

Interesting and easy to program?

synctext commented 7 years ago

each dot in graph is an experiment. Say several dozen points to see trends?

YourDaddyIsHere commented 7 years ago

@synctext Ok, doing experiment, will contact you after I got the results and graphs

YourDaddyIsHere commented 7 years ago

@synctext figure_1

I try the reset possibility of 10%,20%,30%,40%. Only 40% shows significant effect on preventing visiting evil neighbors (but only in the scenario in which there are not too many attack edges).

So I also try reset possibility of 100% (which is the current policy of dispersy walker). And... yes, high reset possibility can prevent visiting evil neighbors (at the cost of low neighbor discovery efficiency, with same steps, walker with high reset possibility discover less blocks)

synctext commented 7 years ago

The above experiment always has 400k honest nodes and 600k evil nodes for each run. Fascinating, you don't discover all honest nodes anymore if there is such an overwhelming amount of attack edges.

Next step: thesis chapter Problem Description + intro.

Problem Description

sidenote. Majority attacks are real for network maintenance and protocol upgrades.

YourDaddyIsHere commented 7 years ago

@synctext Since there is over one month before our next meeting, I think I can write more than Problem Description and Introduction: The core of the thesis is protection against some evil behavior by using transitive trust, we can tell the story like:

  1. A brief introduction to current dispersy walker protocol (already finished a few months ago as your request).

  2. Analyze the weakness of current protocol.

  3. Design some attack specific scenario by utilizing those weakness, for example DDoS by using introduction-response message, poisoning (introduce toxic neighbors to you) etc. Since I have finished the virtual network, I can test those attacks and draw some cool graph (e.g. load balancing graph in DDoS scenario). In other words, this thesis should target some specific attacking scenario. If the problem is too broad, this thesis won't be doable.

  4. All of those specific attacks can be mitigated by preventing walking into evil region. Random walker with probability to teleport home is a good way. Every time after walker teleport home, it will randomly pick one of the "trusted neighbor" to visit in next step. (i.e. it only trusts neighbors introduced by its "trusted neighbor", that is what we called "transitive trust")

As the results in the experiment I just finished, teleport home algorithm can reduce the number of evil nodes it visits. So, it works, story ends...

Does this story make sense?

By the way, wish you have a good vacation

synctext commented 7 years ago

makes sense, good stuff!

YourDaddyIsHere commented 7 years ago

@synctext Here is the first draft, not yet finished but can still tell the sketch of the story: thesis-report.pdf

synctext commented 7 years ago

Comments:

synctext commented 7 years ago

Epic Quinten walker code in IPv8 output

qstokkink commented 7 years ago

@synctext that walker is actually already live in TrustChain on devel (see https://github.com/Tribler/tribler/blob/devel/Tribler/community/trustchain/community.py#L289).

Side note: because I wanted the IPv8 mechanism generic and decoupled it is also much more complicated. In fact, I consider the edge based walking to be the most complex code in IPv8.

YourDaddyIsHere commented 7 years ago

@synctext Ah,that is a epic work.It is fundamental change in Walker strategy.

But the story of my thesis is adding improvement to "current dispersy walker" (the original one without live edge and take fully random walk). Can I still follow my story line by treating the original walker as current walker (it is now a "historical" walker, not a "current walker" any longer because of Quinten's work)? Because if I say the original walker does not exist any more, it will undermine the whole story line of my thesis...

Since all my experiments have been finished and I use the original walker as baseline, and the story line of the thesis report is also adding improvement to original walker, can I still use my current codes and follow my current story line? Because such big change (adding IPV8 and those new stuff to my clean slate walker) in the codes consumes to much time and I need to redo all experiments, which is also time consuming, but I really hurry to graduate, I am running out of budget...

My work now mainly follow the work of Pim Veldhuisen,adding improvement according to its limitation.

My story line is now:

In the experiment: I keep the edge between my walker and the trusted peer (the peer have blocks with us or the peers directly trusted by our trusted peers) alive, preventing NAT hole closed. So, instead of letting trusted nodes time out, we can make them available for longer time.

The teleport home walker also follow the strategy that: visit a neighbor A, A introduce B to me and my walker has a probability to teleport home, otherwise visit B and so on. That is the same with Pim Veldhuisen's simulation.

I also test another worker which take random walker but give the trusted peer higher probability. I test the two new walker using the original walker (no live edge, take fully random walk) as base line.

The improvement compare with Pim Veldhuisen work is: Pim Veldhuisen give all peers infinite life span. Hence a high reputation peer will always stay in his top 10 peer list hence will cause load balancing issue. And keep a peer alive forever means we can not clean sybils in our peer list using time out. So I give the trusted peer finite life span (but still 10 times longer than normal peer), hence we can make trusted peer available for longer time and clean sybils by time out, and with a finite life span, a high reputation peer will not have global impact in the whole duration of experiment.

And... as you know, the experiments are done and the results are good. I have change the simulated network to 30% honest peers and 70% evil peers, the results are still good. But adding the new features according to Quintens works will cost too much time... I am really running out of budget...

synctext commented 7 years ago

storyline is still: what walker works best in an evil majority environment..

YourDaddyIsHere commented 7 years ago

ok, I will follow the current story line, trying to finish the new version of report before next week

synctext commented 7 years ago

latest thesis report URL: https://github.com/YourDaddyIsHere/MSc-Thesis/blob/master/thesis-report.pdf

YourDaddyIsHere commented 7 years ago

@synctext That is not the latest... I forget to push the latest one to the repository these days.

I push the latest one a few minutes ago, now it is the latest one: Thesis Report

I am drawing some graphs to better illustration, will keep update it until the next meeting in 6, Sep

synctext commented 7 years ago

commenting.. thesis-report (3).pdf

YourDaddyIsHere commented 7 years ago

@synctext latest version report

1000 crawl per second limit 1

By the way, because the reserved days for defense are October 23 to October 25, can we figure out the committee member in our next meeting (21,Sep)? There is only 4 weeks to go...

synctext commented 7 years ago

solid progress, good results

devos50 commented 7 years ago

@YourDaddyIsHere note that Figure 3.6 in your thesis shows a block graph, used as input for the Temporal Pagerank algorithm, not NetFlow.

synctext commented 7 years ago

Comments on this thesis version:

YourDaddyIsHere commented 7 years ago

@devos50 Oh,thank you,that is a mistake in caption. In previous paragraph I said Figure3.6 is for temporal page rank

YourDaddyIsHere commented 7 years ago

@synctext We do not schedule a next meeting last time, should we have a meeting before the deadline of handing in the final report? Since the defense is at 20,October, the deadline of handing in final report is around 13 October. I have time for every day of the following weeks.

YourDaddyIsHere commented 7 years ago

@synctext Since the defense is in 20, October. The deadline for hand in the report is around 12, October. After that, I will have 1 week for preparing the presentation in defense, could we schedule a meeting at 15~19 October? I need some suggestions on my presentation. Otherwise I won't know whether it is good or crappy...

YourDaddyIsHere commented 7 years ago

The latest version of my thesis is in this repository.

I will update it multiple times every day until the mid night of 12, October (Wednesday this week) @

YourDaddyIsHere commented 7 years ago

@synctext OK, it is almost final version now, I have the feedback from both committee members now, I have revised the thesis report according to their suggestions, I think we should have a talk tomorrow (11, October) then I have one day left to move to the final version.

YourDaddyIsHere commented 7 years ago

@synctext Have some changes a few minutes ago. the current slides: thesis defense further simplified.pptx

synctext commented 7 years ago

Too many slides for a 30 minute presentation, Max 40min. First 8 slides, remove half. Just present solution 1, "in my thesis I looked at a smarter solution". In general slide 1-25 could be made more in-depth and scientific. Quick detailed comments:

YourDaddyIsHere commented 7 years ago

@devos50 @qstokkink The defense is tomorrow (20 October) at 10:00 - 12:00 in the morning. The room is HB.03.230 katwijkzaal

qstokkink commented 7 years ago

@YourDaddyIsHere I can't make it as I'm on holiday tomorrow, but good luck with your defense!

synctext commented 6 years ago

Final master thesis: Peer Discovery With Transitive Trust in Distributed System

synctext commented 1 year ago

Related work from modelling side using cellular automaton paradigm. Network Automata: Coupling structure and function in real-world networks. Our angle is not network topology, but the connectivity and trust integration. ToDo: Cellular Automata and game theory integration (e.g. Meritrank and AAMAS paper).