ethereum / devp2p

Ethereum peer-to-peer networking specifications
984 stars 275 forks source link

discv5: topic radius consensus problem #112

Closed fjl closed 4 years ago

fjl commented 5 years ago

Both searching and advertising rely on the 'radius estimation' to figure out which nodes are supposed to keep registrations for a certain topic. To estimate the radius from received tickets, the node checks how closely the time in received ticket matches the target-ad-lifetime value, a protocol constant. A big issue with this approach is that all nodes must agree on a certain constant, otherwise the algorithm doesn't converge properly. Getting the radius right is important, and small implementation differences can mess up the system.

I think it'd be better to estimate the radius based on the rate of successful registrations within a region of address space. That is, instead of trying to converge on the optimum waiting time, maintain a table of address buckets with a running average of the time it took to actually get registered in that region.

zsfelfoldi commented 5 years ago

Here is my latest insight about discv5 radius control which is based on intuitions mostly but it might suggest some design decisions which could make our lives easier in the future. The issue is that there is no safe and unattackable way of detecting the radius of advertisements for a certain topic. Nodes inside the "real" radius will know that they receive a significant amount of registration requests so they are probably inside but there is no way to prove this to others. My intuition tells me that there might be a natural incentive for anyone interested in advertising a topic to be inside the radius of that topic. It certainly offers a slight advantage because at least they can always advertise themselves. So at some point advertisers might start mining addresses close to the topic hash. If advertising multiple topics, they might also use multiple addresses and Kademlia tables. I think this is fine, maybe even good. DHT is an anarchistic thing where there is no way to enforce proper behavior on the global scale but local "communities" with shared interests (like nodes advertising a topic) can still function well. This is a bit like territorial behavior. If you want recent and relevant information, if you want to protect your interests then just be there. It is not the best possible group dynamic but maybe the best we can achieve at this level, without global consensus. And it is probably good enough for our purpose. Still, if address mining is cheap then this can lead to a very distorted address space with extremely high address density around popular topic hashes. So maybe this is where we should optionally use PoW to make address mining harder. (Note that this does not mean that we do not need registrations at all. Although we can imagine a DHT topology where advertisers of a topic are conveniently close to each other and the topic hash and therefore easy to find, the prerequisite of this dynamic is an active effort to form a "topic community", to let each other and everyone in the region know that a certain group claims the given territory as their "topic radius", their meeting area.)

Kademlia table can also function like a simple reputation system: nodes that have been in our table and functioned well for a long time can have priority to stay there. We could incorporate PoW into this priority system: nodes with a long and good history or a high enough PoW can get into the limited-size buckets of our table. In addition to the address, PoW can also reference an absolute time value before which it is invalid and after which its value starts to decrease. This means that you don't mine an address for forever, you get a chance to get into the DHT where you can then stay by keeping stable connections. This will increase the value of connections and incentivize good behavior. Good behavior is defined by the priority score formula. This formula should include response statistics, but it can also include other conditions, like reporting a correct topic density. Defining the boundaries of the "topic community" is a shared interest of the community. With the territorial analogy, this is similar to publicly displaying a flag. In order to be part of the community and have your own domain inside, you have to publicly state in your domain that it is also part of the larger group. Topic density is something that is subjective but everyone can expect others in the neighborhood to receive a similar amount of registrations over a longer time period. The Kademlia priority formula for long term connections could also require the peers to report topic density values similar to ours (with the required similarity being inversely proportional to the peer's distance from us). Having a recent PoW should temporarily relieve this requirement because new peers do not know the real topic density. So the formula should look something like this: priority = connectionTime * similarityScore + PoW.value * exp(-PoW.age / PoW_value_TC) Note that PoW_value_TC is longer than the time needed to build some connections in the DHT, start getting registrations and measure topic density (which measurement should also have its own time constant that evens out short-term fluctuations in registration rate). These time constants could be in the range of hours or maybe even days.

This mechanism is not something we should necessarily implement in the beginning. However we should design the topic density reporting mechanism to allow implementing it later if necessary. So instead of just adding a request-reply to the protocol (asking for a topic and replying with a single bit), I suggest adding this info to the ENR and update it when it has significantly changed. Maybe just reference it as a hash because this topic info can have a significant size. It should be a limited length list of (topic, density) pairs containing the topics with the highest density. Also, unlike the original suggestion, topic density (which can be measured as the average number of registration attempts per minute) should be reported as a scalar value instead of a single bit switching at a pre-defined threshold. This is important in order to make similarity comparison possible. Reporting all important topics in the ENR is useful because it makes it harder to report different values to different peers (just like publicly displayed flags). ENRs have serial numbers, we expect to never see two different signed ENRs with the same number from the same node. We also expect SNs to not increase at an unusually quick rate, especially with many versions being inaccessible to us. Here I am not trying to specify a fraud detection heuristic but I believe this is something ENRs are absolutely good for. First we can just release discv5 implementations that follow the rules and build a crawler that displays the reported info and makes it easy for us to see how nodes normally behave and whether there is an attack and how that can be detected from the reported numbers. Then we can design better reputation and fraud detection heuristics.