Open fjl opened 5 years ago
I'm worried that although you can now model the cost of eclipsing a node as non-zero, this cost will still likely be relatively insubstantial to an attacker if we keep the cost low enough for normal nodes (and even resource constrained devices).
It also seems that we would need to keep the PoW fixed rather than dynamically adjusting (like in a PoW consensus) to ensure that an attacker can't make it costly for honest nodes to participate. [It is also unclear at first glance how you would actually do a dynamic PoW in this context..]
How do you plan on choosing a suitable value for the difficulty of the PoW?
This problem is fundamental to DHTs, right? Are there any other novel solutions that don't use PoW that you have come upon?
It also seems that we would need to keep the PoW fixed rather than dynamically adjusting (like in a PoW consensus) to ensure that an attacker can't make it costly for honest nodes to participate. [It is also unclear at first glance how you would actually do a dynamic PoW in this context..]
How do you plan on choosing a suitable value for the difficulty of the PoW?
Yes, the idea here is to keep a static PoW target. There is no way to attack that in the same way the blockchain 'difficulty' could be attacked. We'd basically decide on the amount of resources creating a node ID should take and put these parameters into the software. Changing it would be a software update.
This problem is fundamental to DHTs, right? Are there any other novel solutions that don't use PoW that you have come upon?
Not really. There was a cool idea in Low-Resource Eclipse Attacks on Ethereum’s Peer-to-Peer Network to make the distance measure private to each node (this is Countermeasure 3 in the paper). It breaks routing in a fundamental way, but might be worth going off on that tangent a little longer to see if it leads anywhere. The advantage we have over traditional DHTs is that we're not concerned with content-adressed data storage. It might actually be OK to break routing or make it less efficient because we really only care about random walks of the DHT.
Actually, let's go on that tangent right now. Most ideas in discv5 (basic wire protocol, ENR, liveness check logic, topic queues) are universal and not specifically tied to the Kademlia structure. Topic ad distribution is, but only in that it builds on the idea of XOR distance, and we're actually still unsure about how to do the distribution right.
If we'd break routing, essentially removing Kademlia from the protocol, we'd need something to replace it. That something needs to have:
cc @dryajov
Has there been a decision on this topic? I think the PoW solution is not very future proof, since as soon as at least one ASIC exists for the algorithm chosen, keys for the network can be sold in bulks cheaply enough that undermines the purpose of this mechanism. So in the end all that is left is a false sense of security, yet with all the downsides of the approach: I.e. low end devices unable to join etc. Or do I miss anything?
No decision has been made so far. I don't like PoW either.
Simple idea to try from @zsfelfoldi:
If we change the node distance calculation to be based on the hash of the node's IP address, it would be much harder to generate chosen IDs. We can actually pull this off with discv5 because the protocol verifies IP addresses (i.e. they can't be spoofed) and we can detect our own global IP address using the PING/PONG source address field.
There are some downsides to this idea: if routing is completely dependent on IP address distribution, we have to deal with odd cases like many nodes being behind the same NAT address. There would also be logic specific to IPv6 to avoid mining on an address suffix.
That's an interesting idea. One concern I have is that it would mean if a client is changing IP's frequently (which is common since most home ISP's do not provide static IP's by default), would that mean severe down-time for home users every time their IP changes (since the client needs to restart the whole peer discovery and reputation building from scratch) ?
One solution would be, after a first connection with a (new) peer, make it so that all his peers would accept subsequent incoming connections from new IP's, as long as the hash of that IP is signed with the same formerly provided public-key (say, during handshake). Makes sense?
So: First connection - > new peer registered if hash(IP) matches -> store peer's PK locally and do normal accounting of peer quality. Subsequent connections from new IP but same peer PK -> verify that hash(IP) matches and PK signature matches -> update IP/node ID locally to new IP (or better, only remove an IP association after X days of dormant time, and include the new one as also valid for that peer PK). This would make it so that a lost connection due IP changes can be quickly recovered with its existing peers.
Another thing to think about is if there is any privacy implications of such scheme.
This issue attempts to collect our options regarding proof-of-work on node identities. This isn't a complete solution just yet, more a place where we can dump thoughts and collect research.
First, let's describe what the problem is:
discv5 is a Kademlia-inspired system. As such, it relies on the XOR distance measure to establish routing tables, forming an overlay network. All known structured overlay networks are known to be susceptible to attacks (such as the eclipse attack) based on attacker-chosen node IDs. Proof-of-work is one solution to this problem. If generating a new node identity required a certain amount of computational work, attackers would have a harder time creating chosen node IDs to influence routing tables.
Another solution is to design an overlay network that doesn't use node IDs for routing, or using an unstructured network topology. If such a design exists, and fulfills the requirements we have for node discovery, I'd strongly prefer to use it instead of adding proof-of-work. Unfortunately I don't know of any overlay design with those properties.
How much work?
Creating a new node identity is a one time process for most users, but a repeated task for the attacker. The kind of attacker we want to rule out is one using many cheap cloud servers. These are typically memory-constrained. If creating a new node required 4GB of memory, it'd be harder to parallelize this operation at the cost of raising the barrier of entry into the network. We are less concerned with CPU time consumption of the proof-of-work function. Computation times up to ten seconds are no issue for most people, especially for a one-time task.
Suitable proof-of-work functions
Since all network participants should verify the proof-of-work value if provided, the function chosen must be asymmetric, i.e. verification should take very little effort. We also need to consider that the chosen function must be simple to implement in multiple programming languages. It'd be best to choose something that's already available.
There aren't a lot of functions to choose from here, so we might as well look at all the current options:
Implementation Ideas
In discv5, node IDs are derived from node records using an 'identity scheme'. This is good because any piece of metadata can be attached to the node and relayed in discovery, or even across multiple discovery systems. This additional metadata can also be used to derive the overlay node ID.
There are multiple ways to integrate proof-of-work into ENR, depending on how much flexibility we want. If the proof-of-work value would influence the node ID directly, we'd need to define a new identity scheme for it. We could also leave the identity scheme as-is and add an attribute containing a value related to the ID created by any scheme. I am strongly leaning toward the latter approach, since it allows us to keep improving the PoW while keeping the ID stable.
If we were to choose Cuckoo Cycle, an additional complication would be transmitting the proof value in the discovery protocol. I did research this a bit: to make this work, we'd add an ENR attribute containing an 8-byte PoW nonce. This attribute would simply signal support for proof-of-work. To actually create the proof, we'd basically try random nonces until a cuckoo cycle is found for the
sha256(node-id || nonce)
input value. The proof would have to be transmitted in a new protocol message or during the handshake.