Other peer discovery mechanism

mhchia commented 6 years ago

What is wrong?

We use a global topic for nodes to broadcast their ShardPreference, informing nodes which nodes are listening in which shards. Even though a ShardPreference only occupies SHARD_COUNT bits, along with more bytes occupied by packet headers, it still might be an issue when the number of nodes in the sharding P2P network grows to a really big number. It might also be a problem that, ShardPreference is not easily verified. Therefore, scams are not easily avoidable. A node can connect to the the node who just broadcast the ShardPreference and ask for Proof-of-Custody things to verify if the node actually listens to that shard. However, it is still quite tricky.

How can it be fixed?

Find other peer discovery approaches. Possible options we already had in mind are

2 DHTs, one is used dedicatedly to translate peerID to IP and port, while another one is used for peer discovery. Its key should correlate with shardID and possibly peerID(?), and the value is peerID.
- It's just an idea. I'm not sure if it works or not.
rendezvous protocol
- Is it quite experimental?
DHT providers with topics(example is here )
Others

Edit: added "scamming through ShardPreference channel" in "What is wrong?"

mhchia commented 6 years ago

Should be a good timing to start off investigating this.

jrhea commented 6 years ago

I'm curious, how does a node decide what shard to join? Is it random, or do they join the shard that has the least number of participants?

mhchia commented 6 years ago

A node can choose what shard to join by their own will. For validators, they will be assigned specific shards to join by the beacon chain

mhchia commented 6 years ago

Jannik is working on this design. Reference: https://github.com/jannikluhn/sharding-netsim/issues/3, https://github.com/jannikluhn/sharding-netsim/issues/4

raulk commented 6 years ago

Quick note just to bring provider records into consideration. With go-libp2p-kad-dht, you can declare yourself as a provider of a CID (content ID).

Other nodes can look up providers for a given CID on the DHT. We could experiment with setting a value like: "eth:shard:" for the payload of the CID, hashed with whatever function and encoded in base58, or else.

Nodes can then lookup members "providing" membership in a shard using FindProviders: https://github.com/libp2p/go-libp2p-kad-dht/blob/master/routing.go#L456

I'll also enquire what the status of rendezvous is.

jannikluhn commented 6 years ago

@raulk Curious about this, could you please elaborate a little on how this works? I'm guessing the DHT maps CIDs to a list of node ids? I briefly thought about something like this, but it seemed a bit weird (and potentially dangerous) to me that there would be nodes that know about all nodes in a single shard.

jrhea commented 6 years ago

A node can choose what shard to join by their own will. For validators, they will be assigned specific shards to join by the beacon chain

Ok so if nodes can join a shard of their choosing, how do you ensure that there are enough nodes in a shard? Will each type of client (i.e. Nimbus, PegaSys, etc) implement different logic for choosing a shard to join, or will they all just initially select a shard to join at random?

mhchia commented 6 years ago

@jrhea

Ok so if nodes can join a shard of their choosing, how do you ensure that there are enough nodes in a shard?

I think that is what "shard load balancing" wants to solve, but IMO currently we don't have a specific approach. Something which might mitigate this is, we can also let clients connect to a random shard by default.

Will each type of client (i.e. Nimbus, PegaSys, etc) implement different logic for choosing a shard to join, or will they all just initially select a shard to join at random?

I think it might be possible, maybe we can have a consensus on how to do this later.

jrhea commented 6 years ago

@mhchia and @jannikluhn, I was thinking about a scheme for deciding what shard for a client to join...

peer is as defined in libP2P where peer.id = SHA256(peer.pubkey)
c is the number of shards

a client performs the following calculation to determine what shard to join:

peer.id mod c

Even if c isn't a factor of 2^256 the bias would be so low (on the order of 2^-256) that it would be undetectable.

Benefits:

shard topics would be evenly populated by clients
determining the shard a peer belongs to could be calculated instead of relying on other methods

I haven't thought about how to manage the scenario when a client needs to switch shards, but peer.id would have to account for it somehow.

jannikluhn commented 6 years ago

I think in most cases users should decide manually what shard they want to join (mostly because they are interested in a particular contract on that shard). "Forcing" them to a different shard would not be a good solution. What one could do is "soft load balancing", i.e. if nodes don't have a preference because they are joining for the first time suggest a different shard with the lowest gas price.

To ensure that shards aren't empty (especially in the beginning when usage is low), I see two viable options:

have one (or more) bootstrapping nodes for each shard that can serve at least the number of validators assigned to that shard at all times
make validators by default connect to a random shard (in addition to their validation assignment). That should ensure roughly a 1:1 ratio of validators to static nodes.

jrhea commented 6 years ago

@jannikluhn thanks for clearing that up - great explanation. If you don't mind, I have a couple of follow-up questions.

I think in most cases users should decide manually what shard they want to join (mostly because they are interested in a particular contract on that shard).

How does the user know what shard a contract is on? Is it that info stored on the main chain, will they find out by asking members of a global topic, or something else?

make validators by default connect to a random shard (in addition to their validation assignment). That should ensure roughly a 1:1 ratio of validators to static nodes.

What is the definition of a 'static node'? Again, thanks for the response and sorry for all the questions.

jannikluhn commented 6 years ago

How does the user know what shard a contract is on?

I'd imagine it to be the same way users know contract addresses today, the shard id would just be an additional prefix to the address. So mostly "off-chain", but name resolvers on some shard or the main chain are also possible.

What is the definition of a 'static node'?

I meant nodes that don't change their shards frequently (to distinguish from validators).

raulk commented 6 years ago

@jannikluhn

Curious about this, could you please elaborate a little on how this works? I'm guessing the DHT maps CIDs to a list of node ids? I briefly thought about something like this, but it seemed a bit weird (and potentially dangerous) to me that there would be nodes that know about all nodes in a single shard.

I like to think of the "provider" entry like a "symlink" in the DHT. The mapping of [key=>nodes who store it] is done by distance metric, but instead of storing the actual value, it stores who is known to possess that value.

But I think you are right. Given the discrete domain of shards (1024 shards?), the CIDs would be predictable and the nodes responsible for those prefixes could become attack targets.

go-libp2p-rendezvous (for reference purposes: spec, impl) seems like a direction to explore, as well as pubsub (which would require bootstrap nodes as well, i.e. rendezvous).

I guess one of the complexities is how to guard against spurious peers. Perhaps the rendezvous nodes could send challenges to nodes registered on a shard periodically – or the users of discovery could ask for deregistration of spurious nodes by presenting a proof of data unavailability (or something simliar) to the rendezvous nodes? i.e. if I connect to a client that's registered on shard X, but I discover that it's a lie, I can present a proof to have that node de-registered.

ethresearch / sharding-p2p-poc

Other peer discovery mechanism #47

What is wrong?

How can it be fixed?