Multiple NODES response messages

acolytec3 commented 2 years ago

The NODES message in the Portal Network wire spec calls for a responding node to send multiple NODES messages when needing to send a number of ENRs to the requesting node that exceeds the byte size allowed by discv5 (1280 bytes) (follows the Discv5 wire spec on NODES messages). The challenge here is that Portal Network messages are included in a payload for the discv5 TALKREQ/TALKRESP message type and Discv5 wire spec only allows one TALKRESP per TALKREQ sent. As such, at least in the Javascript implementation of discv5, additional Portal Network NODES messages are dropped instead of decoded by discv5.

Several alternatives are available: 1) Add new logic at the discv5 layer to look for Portal Network NODES messages and handle them similarly to Discv5 NODES messages (i.e. determine how many will be sent and keep track of each response as it comes in).

The downside is this requires Discv5 to be aware of application layer logic and violates the separation of concerns that was aimed for when we elected to use the Discv5 networking layer for Portal Network 2) Revise the Portal Network Wire Spec for FINDNODES/NODES such that the responding Nodes sends the Total NODES messages and then something like a pagination cursor in the NODES response that indicates which subset of ENRs is being sent in that individual NODES response. The requesting node could then send this pagination cursor in subsequent FINDNODES message until all ENRs have been sent
The downside here is a fair bit of complexity to orchestrate these messages and its quite possible that packets could be sent out of order or dropped 3) NODES responses larger than 1280 bytes could conceivably be sent over uTP and implementations could leverage the existing uTP streaming logic to send the list of ENRs.
This is likely easier to implement than option 2) above but still would require some work along the lines of #134 to help distinguish what is being sent 4) We could just accept that NODES responses will be limited to a single response. This sets a rough upper bound on number of ENRs that can be returned to between 8-9 (based on observations around current testnet node behavior where ENRs are roughly 135 bytes in length)

acolytec3 commented 2 years ago

As a starting point, I prefer number 4 of these as it is far and away the least complex and seems like we could tackle this at some future time once we begin to observe routing table/network health and how hard it is for nodes to find a sufficient number of neighbors.

pipermerriam commented 2 years ago

I'm curious what the implementation teams think about option 3 (using uTP) for this response when the response size exceeds the packet size. It seems like it should/could be cleaner than the messy logic of multiple disparate packets.

I'm also game to entertain option 4, but I'm a little concerned in imposing this limitation, though I also don't have a compelling reason to justify needing larger response sizes.

KonradStaniec commented 2 years ago

I like the option 3 as:

we already have streaming protocol implemented for such use cases (payloads larger than packet size), so lets make use of it.
it will stress our uTP implementation. As FINDNODES/NODES is relatively frequent operation in comparison to content operations, we will probably uncover some problems in our uTP usage/implementations

kdeme commented 2 years ago

tldr: I'd be fine with using (optionally) uTP for a larger amount of ENRs.

I think the problems with the current version in spec (or solution 1. from above) are:

It is not explicitly stated in the Discovery v5 specification that multiple TALKRESP are allowed on a TALKREQ. This is for example explicitly stated for the NODES message.
Because of the above, most discv5 implementations will only wait for / accept one TALKRESP message currently.
This solution makes this "Framing of the data" happen basically on the Portal wire layer, by adding the total field and sending multiple reponses (mimicking the discv5 NODES behaviour). The Discovery v5 layer is however not aware of this and has no idea of how many NODES message are supposed to arrive and thus how many it should accept (one could allow "any" amount of packets with a certain timeout, but that could then probably be abused)
Minor issue: The enrs field in the message is defined as List[ByteList, 32], while 32 can never be reached, and this same limit cannot be applied on the end amount of ENRs as it is a limit imposed on the serialized field of the message, not the total over several messages.

An adapted version of solution 1. would be to do the framing on discv5 level. Splitting the packet at layer discv5, keeping it as 1 big Portal message. This could then pack even 32 ENRs. However, I don't like this solution because:

Still need discv5 spec change + discv5 code change.
Any missed packet will make the whole message being lost, and no way to resend missed packets. This is different when sending individual messages, any message lost will not render the others invalid (in discv5 one can still use the received ENRs even if a message got lost and the total was not reached)
This comes in uTP territory, so we might as well go for that as it will work better and doesn't need a discv5 spec change.

I agree also with the downsides mentioned for solution 2.

Solution 4. is what is done now in Fluffy, and it typically allows us to pack ~8 ENRs in the message, which is not great but it is sufficient (for now).

Conclusion: I'm also in favor of sending ENRs over uTP when the amount of ENRs can not be packed in a single discv5 talkresp message. Perhaps it should be left as an optional behaviour for a client to do.

Note however, that the same applies for sending ENRs back on a FindContent request. If we apply the same solution there (using uTP), there will probably be the need to discern the uTP data (content vs ENRs).

acolytec3 commented 2 years ago

I haven't researched this in great detail yet but @ScottyPoi opened this research issue for Ultralight and I think we're starting to see some of the knock-on effects of effectively limiting FINDNODES/NODES to the current practical maximum of 8-9 ENRs. The short of it is that joining nodes aren't effectively unable to find a subset of other nodes despite actively looking since a given node may have a peer who is a bootnode who knows a broad swath of nodes in the network, the bootnode is limited to sharing 8-9 ENRs with the requesting node. As such, once the joining node makes its initial request to the bootnode to populate its routing table, it won't know to re-request nodes from the bootnode and will likely never get all of them anyway since the bootnode only sends the first 8 it can pull from its table at whatever distance (or set of requested distances).

pipermerriam commented 2 years ago

Implementations should be able to quickly populate their routing tables even only 8-9 ENRs per response.

Implementations need to be semi-aware of the limit, and not assume that a request for many distances is exaustive.
Implementations should not rely on bootnodes to do in-depth population of their routing table.
Implementations should rely primarily on "random" nodes from the network to do in-depth routing table population.
Implementations should randomly explore parts of the networ.
- prioritize querying areas of the network that correspond to larger buckets over smaller buckets.
- only query regions of the network that correspond to non-full buckets
- during initialization nodes should be making lots of queries to fill up their routing table quickly.
- once routing table is generally full, or requests are not yielding any new data added to routing table, queries should only happen periodically (30 seconds - 1 minute)

ethereum / portal-network-specs

Multiple NODES response messages #136