discv5: add FINDNODEATT message

zilm13 commented 4 years ago

HI! I've made another research, this one dedicated to efficiency, comparing ENR attribute and topic advertisement, and want to make a proposal which is going to dramatically improve search speed of ENR with certain attributes. Here's a proposal:

In order to improve search time and reduce traffic for ENR attribute, make possible to find even smaller attributed fractions in network in small time and decrease traffic in all ENR search cases, following Discovery V5 API is proposed:

FINDNODEATT Request (0x09)

message-data = [request-id, attribute-key, (attribute-value)]
message-type = 0x09
attribute-key = searched ENR attribute key
attribute-value = (optional) searched ENR attribute value for provided key

FINDNODEATT queries for nodes with non-null ENR attribute identified by attribute-key. If optional parameter attribute-value is provided, only ENRs with attribute-key == attribute-value should be returned. Maximum number of returned records is K-BUCKET. If more than K-BUCKET records are found by search, it is recommended to return a random subset of K-BUCKET size from it.

Optionality of attribute-value could be used for other features, when most nodes in a network are in one of subnetworks, omitting this field doesn’t improve search speed and traffic, so we are going to use it with attribute-value provided. This should limit output only to nodes that belong to a subnetwork of search and provide the answer from each peer in one query. Let’s see the results with 64 nodes advertised in 10,000 peers network. We are going to find 8 of them. We also applied 16 nodes limit to topic advertisement too.

We got 1.4 seconds on average search and just 3.3 Kb of traffic per searcher!

What do you think about it? I want it too be added to Discovery V5. Full research if you are interested in is here

fjl commented 4 years ago

This looks really nice! I have read the research document very quickly (skipped some sections) and wonder how the new request would be used exactly. Would you just perform a random walk of the network with FINDNODE requests and query every found node using FINDNODEATT?

zilm13 commented 4 years ago

@fjl in simulation I've tried it just with peer's own Kademlia table and it works like a charm. I don't think that it's a big sense to make it complicated, as in simulation it was able to discover 8 peers from 16 advertised (means very small fraction) in 10,000 peers network just sending FINDNODEATT to everyone in his own Kademlia table.And it's just feature discovery limit without complication. Another good thing from moving it in this way is that you don't have localization, it's a peers from all parts of network, difficult to sybil.

I'd bet to leave it in this form until otherwise is requested by somebody. The only thing I'd reccommend is to get non-empty result from at least 3 different node, but I didn't dig into security properties much yet.

fjl commented 4 years ago

Hmm, I'm not sure if this is a sound idea then. If you only look in the local table, your view of the network is limited. It will also stop working well beyond a certain network size.

I will think about this a bit more and re-read your research document. It still seems like a good idea, we just need to make some adjustments to really make it work for all cases.

zilm13 commented 4 years ago

Giving 150-200 peers you have in your table, you are filtering it from 40000 peers minus intersections. From my simulation peer scanned 5000 of total 10,000 in such search, as I were able to found half by almost every searcher. And even with your 200 peers it's already 20 seconds if you query 10 peers in a second. So it's a big part of network, half of network in case of 10,000 peers

zilm13 commented 4 years ago

@fjl Ok, we could try to expand search area but first round should be like I've explained, as it's most efficient. My first idea is to do this round, and then do findnode(256) to all that peers, ask the results for findnodeatt, next findnode(255), ask the results for findnodeatt etc. I guess that any subsequent round will cover less and less not queried nodes but anyway I will measure the numbers to see, how bad is it. If you have an idea how to expand initial search in other way, please, share it, I will simulate and see, whether it's better or not

zilm13 commented 4 years ago

I did several simulations, first was as described in previous comment. We are getting 45-50% coverage from the simple FINDNODEATT to our table in ~120 rounds, about 25% after the same number of rounds expanding, 10% after another 120 rounds and less and less further. Though I thought it is even less efficient when expanding.

For comparison I made "spiral expand"(code), the idea is that closest nodes should have top buckets much different with our knowledge and furthest nodes, from our 256th bucket should have less intersections with our environment in their lowest buckets. I've simplified it for simulation, it should be very complicated if done right, anyway it didn't give any signifcant advantages compared to initial idea, moreover, it definitely increases variance.

Both tests were done on 10,000 peers from searcher peers with simulated long live in network, meaning their Kademlia tables were filled enough, though there are enough peers with small tables in observed network. For comparison I ran initial idea with 2,000 peers in network and 100,000. I didn't measure traffic and time separately, but both should be relative to number of rounds.

After all I stay with the simplest solution: step 1: ask everyone you know with FINDNODEATT step 2: ask everyone you know for FIDNNODE(256), query all received nodes for FINDNODEATT, decrease distance by 1, continue

Stop on any step when you think you found enough nodes, because the further you go, the less efficient search is. And you'd never need all the nodes with certain attribute.

zilm13 commented 4 years ago

I completely missed the thing that in Ethereum 2.0 we could participate in several subnetworks at time. So it's recorded in following form:

The ENR MAY contain an entry (attnets) signifying the attestation subnet bitfield with the following form to more easily discover peers participating in particular attestation gossip subnets.

Key Value

attnets SSZ Bitvector[ATTESTATION_SUBNET_COUNT]

Key	Value
attnets	SSZ Bitvector[ATTESTATION_SUBNET_COUNT]

As it's a vector, it doesn't have any length data or whatever, just a plain bitset, so instead of equality we'd like to see inclusion more, so that value AND input_value == input_value or value.include(input_value)

As it has nothing special and relative to Ethereum 2.0, it could be using for short recording of any other capabilities.

So, I'd add another search message FINDNODEATTINC (0x10) for it

Nashatyrev commented 2 years ago

IMO looks really interesting from Eth-2 networking perspective 👍

ethereum / devp2p

discv5: add FINDNODEATT message #151

FINDNODEATT Request (0x09)