ethereum / devp2p

Ethereum peer-to-peer networking specifications
971 stars 273 forks source link

discv5: NAT traversal via Rendezvous protocol [WIP] #207

Open pipermerriam opened 2 years ago

pipermerriam commented 2 years ago

This issue proposes a mechanism for NAT traversal via UDP hole punching.

This issue borrows from https://github.com/ethereum/portal-network-specs/issues/144 which in-turn borrows from https://blog.ipfs.io/2022-01-20-libp2p-hole-punching/

Participants

This mechanism involves communication between three nodes:

Detecting whether you are behind a NAT

Borrowed from: https://twurst.com/articles/stun-without-trust.html#org92b7214

A node in the network should maintain a set E which contains all of the (ip_address, port) values for outbound packets that have been sent by this node.

When receiving a packet, a node should check whether the packet's (ip_address, port) are contained in the set E.

We suggest 2 minutes as a reasonable amount of time before determining that the node is behind a NAT.

For practical purposes, an LRU cache should be used to constrain the overall size of the set E

Signalling whether you are behind a NAT

We define a new field in the ENR with the key "nat".

Traversing the NAT

We define two new message types:

# RELAYREQUEST
relay_request := SSZContainer(from_node_id: uint256, to_node_id: uint256)

# RELAYRESPONSE
relay_response := SSZContainer(response: uint8)

The rendezvous protocol works as follows:

  1. The "initiator" node learns about the "receiver" node through a FINDNODES/FOUNDNODES interaction with the "rendezvous" node.
  2. The "initiator sends a RELAYREQUEST to the "rendevous" node with payload: {from_node_enr: initiator_enr, to_node_id: receiver_node_id}
  3. The "rendezvous" node, upon receiving the RELAYREQUEST from the "initiator" node, sends the same RELAYREQUEST message to the "receiver" node.
  4. The "receiver" node, upon receiving the RELAYREQUEST from the "rendezvous" node, responds with a RELAYRESPONSE with the payload {response: 1} to signal that they have accepted this request. They may alternately respond with {response: 0} if they wish to reject the request. The "receiver" node will also send a PING message to the "initiator" node (this triggers the receiver's NAT to allow and route incoming packets from the initiator's ip/port).
  5. The "rendezvous" node, upon receiving the RELAYRESPONSE from the "receiver" node, accepting the request, will then send the same RELAYRESPONSE message to the "initiator".
  6. The "initiator" node, upon receiving the RELAYRESPONSE accepting the connection, should then send a PING message to the "receiver" node. (this triggers the initiator's NAT to allow and route incoming pckets from the receiver's ip/port)

TODO: diagram message flow... define edge cases like timeouts and how nodes should behave.

TODO- finish definition of the protocol and convert this to a PR towards the spec so that people can comment on individual lines.

emhane commented 2 years ago

So this means the PING from the "receiver" to the "initiator" is dropped but places the entry in the "receiver's" state table for the "initiator's" PING to the "receiver" to be successful as long as it comes in less than 30 seconds, the timeout of a UDP state table entry in many routers, i.e. the time it takes for the RELAYRESPONSE to reach the "initiator" should be less than 30 seconds? The WHOAREYOU challenge of the "receiver" sent in response then uses the state table entry that the "initiator's" PING places in its state table to finalise the hole punching?

AgeManning commented 2 years ago

Nice!

A few thoughts:

  1. I'm a big fan of SSZ, we use it everywhere (in eth2 and lighthouse land) except discv5. Discv5 uses RLP still. I would suggest we stick to one or the other to avoid extra dependencies. Either use RLP here, or shift other other encodings in discv5 to SSZ.
  2. In the relay_response is there are reason its a uint8 vs a bool? Do we have more than two responses?

@emhane - I agree. Typically the round-trip type for requests in discv5 is small, not longer than a few seconds usually, so hoping via an intermediary should be < 30s. I think the initial PING sent by the receiver sets up its IP/port mapping allowing future packets from the initiator. This will probably get dropped if the initiator is itself behind a NAT, but will be received if it is not. In either case the initiator can then establish a handshake with the receiver. In our case, we will have to handle the case where one of our messages gets dropped (but this is implementation specific).

emhane commented 1 year ago

What if the body of the RELAYREQUEST is changed to from_node_enr: Enr, to_node_id: NodeId? Sending the enr of the initiator in the body will supply the receiver with the information it needs to send the PING request to the initiator in step 4.

sambacha commented 1 year ago

Why not consider wireguard for this?

pipermerriam commented 1 year ago

What if the body of the RELAYREQUEST is changed to from_node_enr: Enr, to_node_id: NodeId? Sending the enr of the initiator in the body will supply the receiver with the information it needs to send the PING request to the initiator in step 4.

Yes, this seems appropriate. I now see that without this, the "receiver" node will not necessarily have enough information to send the PING to the initiator, which would mean they would end up needing to do a lookup for them in the network to find their ENR. :+1:

emhane commented 1 year ago

I have implemented your protocol outline @pipermerriam with the changes in @AgeManning 's comment above. Furthermore I changed

emhane commented 1 year ago

I'm changing the to_node_id into to_node_enr in my implementation because otherwise a node has to store the enr of a peer that is potentially behind a NAT so that it knows where to send the hole-punch-ping upon a RELAYRESPONSE with body {response: 1}. This struct to store these ENRs of peers potentially behind a NAT has no obvious capacity limit. It is better if the hole-punch-ping is stateless. The RELAYREQUEST is still small in size in comparison to a NODES response.