ethereum / portal-network-specs

Official repository for specifications for the Portal Network
291 stars 80 forks source link

The distance metric is of ring geometry instead of XOR #90

Open jinfwhuang opened 2 years ago

jinfwhuang commented 2 years ago

The overlay network's routing table use a custom distance function that is of "ring geometry" instead of XOR. discv5's lookup mechanism should not work unless the routing table itself is Kademlia-like, i.e. each k-th bucket has appropriate coverage of a range in the node space. If the only change to the algorithm is swapping out the distance metric, the Kademlia "lookup" algorithm does not work.

If we are also modifying the FINDNODES algorithm to use one that works with the distance measure that is essentially distance in a ring, should we make it explicit and specify the algorithm as well? For example, are we expecting the routing table to be more like chord or symphony.

pipermerriam commented 2 years ago

To the best of my understanding, the Kademlia lookup function still works, with the one caveat that unlike XOR, the ring-geometry approach allows for there to be two nodes that are "closest" to a specific location where-as XOR guarantees that there will always be one node that is closest.

Can you explain more on what it is that you see that seems to break the lookup?

KonradStaniec commented 2 years ago

Just one data point to the discussion.

I remember having similar doubts about it, but convinced my self that it is ok to change it by running kademlia simulation (http://peersim.sourceforge.net/) with distance function changed to the one in state network spec.

The results of simulator were similar to kademlia with xor distance function. One of the things measured was number of average hops in lookups and it also seemed to have standard logarithmic complexity. (although I did not dig that deep into that sim, and it was probably written as part of someones thesis so there could be some weird assumptions in it)

So after running the sim and thinking about it, I have settled on position that it will most probably work but possibly we will encounter some edge cases at some point.

jinfwhuang commented 2 years ago

Sure. Let see if we can find a counterexample. The key intuition for this example is that XOR direction is unidirectional but ring distance is not.

Let's use a n=7 bit id space, node_id \in [0, 128). Let these be the neighbor relationships. node_id neighbors
0 3, 70
3 0
70 0, 74, 64
74 75, 70
64 63, 70

Let say we start at 0 and wants to get to 63. Recall k-bucket rule is such that for each i \in [0, n), we keep k many nodes with distance [2^i, 2^(i+1)) to itself. If there are missing nodes, we collapse the some buckets. Let k = 1

Assume ring distance. 0's k buckets are: i=1, {3} and i=2-5, {70}. Notice that k bucket for i = 6 is not well defined. This would require a modification of the routing table. Let say that that this is a trivial fix and we decide to ignore k bucket with i = n -1. 70's k bucket are: i=1-2, {74} and i=3-5 {0}. The node querying path is 0 -> 70 -> 74, and 70 is the best answer returned to 0. The key insight is that {74, 64} are put into the same bucket, and it happens that 74 is kept because it is a longer live node.

Now with XOR distance. 70's k buckets now become: i=2 {64}, i=3 {74}, and i=4-6 {0}. Clearly, the only path possible is 0 -> 74 -> 64 -> 63.

With this example, one could argue that if we are using ring distance on Kademlia protocol, we need to keep 2 bucket, one for each direction. That would certainly fix this counterexample. But that is another modification to the routing table. I am not sure if there are other counterexamples that might break that fix.

Another hand waving argument about being careful to change the XOR distance is that Kademlia protocol is specifically constructed on top of the XOR metric, and it came after chord, which uses a ring metric.

pipermerriam commented 2 years ago

Ok, I think I see where this is going, and I think you are correct. The lack of unidirectionality means that queries for data at distance X don't always bring you closer since there are two directions and the node may return data from the other direction which takes you further away.

I think in practice this doesn't end up being a problem in the vast majority of cases, but I concur that there are situations where this is a problem.

This leads me towards looking for alternate distance functions that preserve the unidirectionality thing.

jinfwhuang commented 2 years ago

I would say that the choice of metric and algorithm should be analyzed together. That pretty much sum up the basic design of any DHT.

We could keep "ring" distance. I am not an expert of DHT. But the two variety that works with ring distance is Chord and Symphony.

How important is it to have the property of having "adjacent trie node" to be close to each other in the content_id space? I can see some advantages. For example, if an account data is broken up into many leaf data and that anyone accessing that account would be likely want to get access to all of those adjacent node. I would like to hear from you to get a sense on how important is this?

pipermerriam commented 2 years ago

How important is it to have the property of having "adjacent trie node" to be close to each other in the content_id space?

I believe it is a hard requirement but I don't actually have the hard data to back that up and it's possible that I'm wrong. We need to maintain adjacency to reduce the proof overhead of nodes storing the state data. This is a project someone could take on.

Given a trie of data:

Running a small experience to quantify the difference in proof overhead here would be valuable since it will tell us definitively whether we need to preserve locality or whether the overhead incurred by dropping this requirement is manageable.

jinfwhuang commented 2 years ago

I agree that a small experiment could shed some light on how important is to have adjacent content_id to be stored locally on the network.

XOR distance does not necessarily mean that data with adjacent content_id will be randomly put everywhere on the network. Data should still be relatively localized because content_id differing by less significant bits will be contained in the same subtree.

If ring-geometry is required, a CHORD DHT could work. I couldn't think of any blockers for using a a different DHT. Kademlia is known to be more efficient with its routing tables updates and also lookups. The decision would come down to a tradeoff about what is more important.