discv5: choose findnode parameter

fjl commented 5 years ago

In discovery v4, FINDNODE queries for nodes close to a certain public key.
In the discovery v5 draft, we made it query for nodes in a given bucket instead.

While implementing discv5, I noticed that querying by bucket doesn't work at all for small networks (i.e. ones with 10 nodes or less) because FINDNODE will just receive an empty response for most queries.

Should we extend FINDNODE to also return nodes from neighboring buckets or switch back to querying by hash?

zsfelfoldi commented 5 years ago

I did think the issue through and requesting buckets may be a good idea after all if we request a few buckets instead of just one. Example:

node A address: 01001101
target address: 01100111
A xor target: 00101010

If you want to get all N nodes where (N xor target)<(A xor target) then you need to request the buckets where the corresponding bits of (A xor target) are 1. In this example 3, 5, 7 if bucket 1 is the lowers bucket with nodes where MSB is different. If you request all of these buckets then you effectively give away your target address and in return also get exactly the same results. The lowest bucket provides the potentially most valuable results (closest to target) so if there are a lot of nodes in the DHT then you probably only need the lowest (or 2-3 lowest) buckets and you can also retain most of the info about your target. If the DHT is small and you don't get enough results you can still try requesting higher buckets.

zsfelfoldi commented 5 years ago

It is worth considering to allow specifying multiple buckets in the findNode packet. If there are too many results the lower buckets should have precedence. If the highest bucket is lower than one of the requested ones then the relevant entries can be filtered out from the contents of the highest bucket.

FrankSzendzielarz commented 5 years ago

My current thinking is that this is an 'almost' non-issue. If the network size is so small that there is a non-trivial chance all other nodes are outside of the requested bucket, you can just request the next bucket, incurring a minor network overhead under rare circumstances.

zsfelfoldi commented 5 years ago

I'm fine with single-bucket requests too. We had a discussion with Felix today about whether bucket requests are a good idea at all and the main point of my comments is that my answer is a definite yes (with the addition that we should request multiple buckets corresponding to A xor target bits if necessary).

FrankSzendzielarz commented 5 years ago

Yes agreed. bucket requests are a radical improvement. Kademlia was originally designed in a context where it was assumed network participants were altruistic and trying to share content. Eg file sharing. Requesting close nodes by hash places trust on the remote peer they will abide by distance metric. We already had accidental eclipse scenario with mismatched parity implementation as a result. I don't easily see a better solution

fjl commented 5 years ago

I really want to avoid re-requesting with a different bucket because it will complicate the implementation a lot. Will try working in FINDNODE with multiple buckets.

FrankSzendzielarz commented 5 years ago

I think the approach should be that this is a non-issue. If there is a really small network with an unfortunate set of id's , such that a node cannot find other nodes, then that node should try re-booting with a different random target, or some such thing. I don't see this as a protocol issue.

ethereum / devp2p

discv5: choose findnode parameter #79