Trusted peer discovery and improved NAT puncturing

synctext commented 8 years ago

Currently all Dispersy communities have their own isolated walker. This is not very efficient. This issue aims to build upon the ongoing multichain work and create a trusted peer discovery mechnism with high-performance NAT puncturing.

Background reading (general):

Trusted peer discovery:

Technical docs:

YourDaddyIsHere commented 7 years ago

@synctext simulated network in local machine without port limit

It is almost done, I give the active walker with fake network endpoint, hence in the active walker's view, it is connecting to a real network(In other words, I deceiving the walker, make it believe it is walking in real network). So, I don't need to change the logic of the walker.

For Simulated network, we generate only node id list, and store the node id, private key and fake ip and port in database.

Every time the real walker (i.e. active walker) wants to take a step, it sends the message to a fake address and the simulated network will translate the address to node id and then generate the links and multichain-blocks on this node using determinstic random seed generated beforehand. (in other words, links are generated "on the fly", there is no node instance or link instance stored in memory, I store the node-id/(ip,port) lookup table in database to save memory, but I can also move that into memory)

I am doing the experiment and will uploads some figures later.

Since today is 3 July and the next meeting is in 18 July (still 15 days to go...), can we have a 5-10 minutes drop-in meeting before that? (If you like, we can meet after 5:30 PM of any days)

YourDaddyIsHere commented 7 years ago

Number of Honest/evil nodes the walker meet: figure_1

Just as expected: 40% of the nodes in network are honest and 60% are evil, hence in long run, the nodes meet by the walker consist of 40% honest nodes and 60% evil nodes

YourDaddyIsHere commented 7 years ago

@synctext 1.The database is removed, the node id and address lookup table is in memory now. 2.Now it can specify how many attack edge I want.

synctext commented 7 years ago

solid progress! Cool experiment: how much contact with evil nodes for certain walk parameters and attack strength.

Xaxis amount of attack edges. from just a few to twice more attack edges than honest nodes...

Yaxis percentage good vs. evil nodes discovered 0% evil to 100% evil

several line colors for random walk: black 40% reset back to home (alpha), blue 30%, green 20, red 10%. So red gets random walks across trust edges of length 10, far in evil sybil region.

Interesting and easy to program?

synctext commented 7 years ago

each dot in graph is an experiment. Say several dozen points to see trends?

YourDaddyIsHere commented 7 years ago

@synctext Ok, doing experiment, will contact you after I got the results and graphs

YourDaddyIsHere commented 7 years ago

@synctext figure_1

I try the reset possibility of 10%,20%,30%,40%. Only 40% shows significant effect on preventing visiting evil neighbors (but only in the scenario in which there are not too many attack edges).

So I also try reset possibility of 100% (which is the current policy of dispersy walker). And... yes, high reset possibility can prevent visiting evil neighbors (at the cost of low neighbor discovery efficiency, with same steps, walker with high reset possibility discover less blocks)

synctext commented 7 years ago

The above experiment always has 400k honest nodes and 600k evil nodes for each run. Fascinating, you don't discover all honest nodes anymore if there is such an overwhelming amount of attack edges.

Next step: thesis chapter Problem Description + intro.

Problem Description

The key vulnerability of open Distributed Systems is the creation of malicious nodes to overwhelm the entire system. Protection against this Sybil attack is key to designing robust systems. This thesis is a first attempt to solve the difficult problem of preserving the integrity of the system when the majority of the network is evil.
Key to our solution is transitive trust. We assume BarterCast like accounting mechanism. This is pure single peer self-reporting. Currently we have self-reporting by two peers which agree on a TrustChain record. Plus chain can be checked.
We assume the standard attack model of honest and evil region. Fake nodes are easy to create, attack edges which connect to honest nodes are expensive.
We focus on the essential networking primitive within distributed systems of peer discovery
Cardinal research question: discovery honest nodes, avoid evilness
Challenge is to use the transitive trust to void evilness.

sidenote. Majority attacks are real for network maintenance and protocol upgrades.

YourDaddyIsHere commented 7 years ago

@synctext Since there is over one month before our next meeting, I think I can write more than Problem Description and Introduction: The core of the thesis is protection against some evil behavior by using transitive trust, we can tell the story like:

A brief introduction to current dispersy walker protocol (already finished a few months ago as your request).
Analyze the weakness of current protocol.
Design some attack specific scenario by utilizing those weakness, for example DDoS by using introduction-response message, poisoning (introduce toxic neighbors to you) etc. Since I have finished the virtual network, I can test those attacks and draw some cool graph (e.g. load balancing graph in DDoS scenario). In other words, this thesis should target some specific attacking scenario. If the problem is too broad, this thesis won't be doable.
All of those specific attacks can be mitigated by preventing walking into evil region. Random walker with probability to teleport home is a good way. Every time after walker teleport home, it will randomly pick one of the "trusted neighbor" to visit in next step. (i.e. it only trusts neighbors introduced by its "trusted neighbor", that is what we called "transitive trust")

As the results in the experiment I just finished, teleport home algorithm can reduce the number of evil nodes it visits. So, it works, story ends...

Does this story make sense?

By the way, wish you have a good vacation

synctext commented 7 years ago

makes sense, good stuff!

YourDaddyIsHere commented 7 years ago

@synctext Here is the first draft, not yet finished but can still tell the sketch of the story: thesis-report.pdf

synctext commented 7 years ago

Comments:

needs more science; like: We focus on advancing the state-of-the-art in a realistic scenario for creating trust. Creating trust has proven to be difficult, especially within distributed systems setting, without a central controlling server or single controlling organisational entity. We focus on the unsolved problem of creating trust when the majority within the system consists of attackers. To make this scenarion still somewhat feasible we assume a reputation function which can estimate the likelihood of an entity being an attacker. Within this thesis we focus on creating trust when honest entities are surrounded by an evil majority. For simplicity we fix our challenging scenarios to 70% attackers and 30% honest entities.
abstract is too informal
First line of first thesis chapter: Peer to Peer System is one of the architecture of Distributed System, according to [5], it can be further categorized as ”pure” Peer to Peer or ”hybrid” Peer to Peer. the reader is now confused by the four different concepts you introduce.
chapter one: provide formal definition of peer discovery.
hunting videos too informal.
explain trustchain in detail (as many master thesis works do)
explain Dispersy (message distribution to groups and persistent storage) and walker (NAT-puncture, peer discovery, network address discovery) in full detail
all results are critically dependent on the input graph of TrustChain connections. this needs explanation and graphs showing the nature of the input data.
Use possible a "related work chapter" to explain Tribler, walker, Dispersy,Sybil attack(+picture). Put all explanations in 1 chapter, not spread out across first three.
Chapter Design of Transitive-Trust Walker. Contains more engineering details of work others did then your own work. Less byte positions, more what, why, how of your transitive trust walker design principles.
4.7 Reputation System start with transitive trust / PageRank / PimRank
steps required to exhaust 95% of peers what is the scientific goal of this experiment?

synctext commented 7 years ago

Epic Quinten walker code in IPv8 output

qstokkink commented 7 years ago

@synctext that walker is actually already live in TrustChain on devel (see https://github.com/Tribler/tribler/blob/devel/Tribler/community/trustchain/community.py#L289).

Side note: because I wanted the IPv8 mechanism generic and decoupled it is also much more complicated. In fact, I consider the edge based walking to be the most complex code in IPv8.

YourDaddyIsHere commented 7 years ago

@synctext Ah,that is a epic work.It is fundamental change in Walker strategy.

But the story of my thesis is adding improvement to "current dispersy walker" (the original one without live edge and take fully random walk). Can I still follow my story line by treating the original walker as current walker (it is now a "historical" walker, not a "current walker" any longer because of Quinten's work)? Because if I say the original walker does not exist any more, it will undermine the whole story line of my thesis...

Since all my experiments have been finished and I use the original walker as baseline, and the story line of the thesis report is also adding improvement to original walker, can I still use my current codes and follow my current story line? Because such big change (adding IPV8 and those new stuff to my clean slate walker) in the codes consumes to much time and I need to redo all experiments, which is also time consuming, but I really hurry to graduate, I am running out of budget...

My work now mainly follow the work of Pim Veldhuisen,adding improvement according to its limitation.

My story line is now:

In the experiment: I keep the edge between my walker and the trusted peer (the peer have blocks with us or the peers directly trusted by our trusted peers) alive, preventing NAT hole closed. So, instead of letting trusted nodes time out, we can make them available for longer time.

The teleport home walker also follow the strategy that: visit a neighbor A, A introduce B to me and my walker has a probability to teleport home, otherwise visit B and so on. That is the same with Pim Veldhuisen's simulation.

I also test another worker which take random walker but give the trusted peer higher probability. I test the two new walker using the original walker (no live edge, take fully random walk) as base line.

The improvement compare with Pim Veldhuisen work is: Pim Veldhuisen give all peers infinite life span. Hence a high reputation peer will always stay in his top 10 peer list hence will cause load balancing issue. And keep a peer alive forever means we can not clean sybils in our peer list using time out. So I give the trusted peer finite life span (but still 10 times longer than normal peer), hence we can make trusted peer available for longer time and clean sybils by time out, and with a finite life span, a high reputation peer will not have global impact in the whole duration of experiment.

And... as you know, the experiments are done and the results are good. I have change the simulated network to 30% honest peers and 70% evil peers, the results are still good. But adding the new features according to Quintens works will cost too much time... I am really running out of budget...

synctext commented 7 years ago

storyline is still: what walker works best in an evil majority environment..

YourDaddyIsHere commented 7 years ago

ok, I will follow the current story line, trying to finish the new version of report before next week

synctext commented 7 years ago

latest thesis report URL: https://github.com/YourDaddyIsHere/MSc-Thesis/blob/master/thesis-report.pdf

YourDaddyIsHere commented 7 years ago

@synctext That is not the latest... I forget to push the latest one to the repository these days.

I push the latest one a few minutes ago, now it is the latest one: Thesis Report

I am drawing some graphs to better illustration, will keep update it until the next meeting in 6, Sep

synctext commented 7 years ago

commenting.. thesis-report (3).pdf

awesome intro. please provide guidance for the reader first. Like "the core topic of this thesis might sounds rather technical and obscure, however, we can easily explain the concept of peer discovery with the following story."
In this article, the task to find out the address of such nodes is called ”peer discovery”. replace: thesis.
Chapter 2: "While the current Tribler Walker already has basic functionalities in place, it is not fully satisfying". Opening line should be more scientific, like: "Creating trust in the online world has been proven difficult. Wikipedia entries written by "AuthoritativeProfessor" might still contain falsehoods. This thesis work aims to create trust within a very challenging environment -- one without any central authority and without any infrastructure we can rely on".
Move all Dispersy engineering details out of chapter 2
3.1 Storing Historic Behaviors -- move to problem description
you can explain the problem of Sybil attacks without any engineering details. One possible angle is explaining about fake trump followers
instead of Multichain, please quote our Trustchain Journal article
possibly call Chapter 3 peer discovery and trust
please don't call it "focused walking" not understandable for experts. "biased walk".
design requirement: load balancing (+ remove the clean state remark).
One easy to read illustration to show 1 real walker 300k honest peers and 700k evil ones
"Figure 5.2: Sybil Prevention Validation" nobody understands this. More like: Experiments with various walker strategies for increasing attack strengths.
before Figure 5.2 do an easy "peer discovery experiment" Conduct 10k walks for the 4 walker settings. 100k attack edges. Show number of steps taken on the X-axis, then plot on the Y-axis the increasing number of unique "peers discovered". Same figure and same experiment, but now with evil vs. honest ratio as Y-axis.
Section 5.3 and others: explain in first sentence why you do this experiment
Move section 5.3 as the first experiment: small network discovery (1 figure Fig 5.3 - 5.6)
time as Y-axis?
please use the Pim Veldhuizen figure notation for load balancing (a curve, sort node-ID by load)
creative final experiment: try to attack your walker. Denial of Service or flooding it with numerous incoming walks. Show how damaging such an attack is. Simply be honest if it takes down the walker, "unsolved problem". You know which neighbors a target victim is connected to. Cost of an attack in Euro: 10TByte costs 30 Euro. Both crawl-request and intro-request.

YourDaddyIsHere commented 7 years ago

@synctext latest version report

all engineering details have been moved to Chapter 3, Chapter 2 now don't have any engineering details. I choose to follow the story told in Chapter 1 to introduce the concept of peer discovery, trust and potential attacks in high level, smoothly move from the story to the formal model of peer discovery strategy.
I use the term "Multichain" and "TrustChain" interchangeably, because some of the references use the term "Multichain", replace all "Multichain" with "TrustChain" might make the reader confused, so I use the two terms interchangeably, according to the context
The load balance graphs have been changed to Pim veldhuisen's fashion, y-axis is visit count, x-axis is the node id.
For the experiment validating the sybil resilence ability, I add a graph where x-axis is time and y-axis is the number of discovered peers, that should give the reader a first impression of what is going on.
For unsolved problem, I do the experiment: deployed the victim and attacker in two machines with exactly same amount of all resources. The victim is a standard Walker, the attacker is an attacking scripts keep sending introduction-request or crawl-request to the victim. According to the result, using introduction-request as weapon in DDoS will cost the attacker more resources than the victim. But using crawl-request will cost the victim much more resources while cost the attack few resources. (that holds true even when the victim reduce the number of blocks to return to 1)

1000 crawl per second limit 1

By the way, because the reserved days for defense are October 23 to October 25, can we figure out the committee member in our next meeting (21,Sep)? There is only 4 weeks to go...

synctext commented 7 years ago

solid progress, good results

devos50 commented 7 years ago

@YourDaddyIsHere note that Figure 3.6 in your thesis shows a block graph, used as input for the Temporal Pagerank algorithm, not NetFlow.

synctext commented 7 years ago

Comments on this thesis version:

Making good progress!
improve general readability and polish
the address book should be over 10 kilometers thick (assuming 20 addresses per page and a 1000 page book is 10cm) include a little extra info
the Ethernet, ARP, DHT, Bittorrent, Tribler explanation is a mere 2 pages is unreadable for non-experts.
anywhere.But plus anyway.Given In multiple location the sentence spacing is lost.
Use already language for experts in Chapter 2. No non-scientific examples needed. Use an example figure/graph with an attacker mode and Alice and Bob. First, Jack is young boy lives in a small village, he is the a participant in the introduction based peer discovery mention above. By using such method, he can in theory find anyone in the world. additionally In our daily life, people already develop defense mechanism to misleading attack, people will not buy a flight ticket to the other side of this planet
Fake account, Sybil or one of the bot controlled by Trumph’s team (also no need to make it this big)?
Only Trustchain please we will use the two terms interchangeably according to the context
In this chapter, we will introduce Dispersy in details as well as the related work for the thesis. You also explain Bitcoin and Trustchain.
The new Walker should not create serious load balancing problems Minor imbalance is OK :-)
Figure 4.1: Walker Architecture please make more compact, big empty illustration. Use terms like Message parsing? Crawling versus peer discovery versus trust record discovery?
Figure two steps and teleport home then start a new random path Please clarify, like put in "you" and make the (a) (b) step 4 hops. Then more dramatic reset home.
mention open source and Github somewhere
Figure 5.2: Random Walker Peer Discovery. Axis numbers not readable.
Remove text lines above figures (duplicates caption text).
Figure 5.5: Random Walker Load Balance please explain/check why a 50% reset walker discovers more then the random walker. Discovers 25-ish less peers.
Graph of your re-visit again theory !
6.3 Unsolved Problem. 5.6 DDoS experimentation and vulnerability
Be clear: even when a single request only returns a single block it creates a DDoS vulnerability! Mitigation ideas: unique numbers, signed request, proof-of-work puzzel

YourDaddyIsHere commented 7 years ago

@devos50 Oh,thank you,that is a mistake in caption. In previous paragraph I said Figure3.6 is for temporal page rank

YourDaddyIsHere commented 7 years ago

@synctext We do not schedule a next meeting last time, should we have a meeting before the deadline of handing in the final report? Since the defense is at 20,October, the deadline of handing in final report is around 13 October. I have time for every day of the following weeks.

YourDaddyIsHere commented 7 years ago

@synctext Since the defense is in 20, October. The deadline for hand in the report is around 12, October. After that, I will have 1 week for preparing the presentation in defense, could we schedule a meeting at 15~19 October? I need some suggestions on my presentation. Otherwise I won't know whether it is good or crappy...

YourDaddyIsHere commented 7 years ago

The latest version of my thesis is in this repository.

I will update it multiple times every day until the mid night of 12, October (Wednesday this week) @

YourDaddyIsHere commented 7 years ago

@synctext OK, it is almost final version now, I have the feedback from both committee members now, I have revised the thesis report according to their suggestions, I think we should have a talk tomorrow (11, October) then I have one day left to move to the final version.

YourDaddyIsHere commented 7 years ago

@synctext Have some changes a few minutes ago. the current slides: thesis defense further simplified.pptx

synctext commented 7 years ago

Too many slides for a 30 minute presentation, Max 40min. First 8 slides, remove half. Just present solution 1, "in my thesis I looked at a smarter solution". In general slide 1-25 could be made more in-depth and scientific. Quick detailed comments:

have an early slide called research question. Define thesis research question in a single sentence.
slide 15 has too small letters to be readable
slide 11, first mechanism called "outrun in numbers", not very scientific. second mechanism has no name.
slide 16, current dispersy peer discovery mechanism (more scientific). Random connect overlay, no bias, no Sybil attack defense.
slide 18, blockchain introduction; enlarge picture please.
slide 23, Sybil attack model, [CITATION] SybilGuard paper
slide 26 Distributed personalised Pagerank, [CITATION] teleport feature parameter Alpha
slide 43, remove.
ToAdd: is your research or your software ready for real-world usage?

YourDaddyIsHere commented 7 years ago

@devos50 @qstokkink The defense is tomorrow (20 October) at 10:00 - 12:00 in the morning. The room is HB.03.230 katwijkzaal

qstokkink commented 7 years ago

@YourDaddyIsHere I can't make it as I'm on holiday tomorrow, but good luck with your defense!

synctext commented 6 years ago

Final master thesis: Peer Discovery With Transitive Trust in Distributed System

synctext commented 1 year ago

Related work from modelling side using cellular automaton paradigm. Network Automata: Coupling structure and function in real-world networks. Our angle is not network topology, but the connectivity and trust integration. ToDo: Cellular Automata and game theory integration (e.g. Meritrank and AAMAS paper).

Tribler / tribler

Trusted peer discovery and improved NAT puncturing #2623