ethresearch / p2p

30 stars 0 forks source link

Rationale for RLP alternatives in Discovery v5? #15

Open FrankSzendzielarz opened 5 years ago

FrankSzendzielarz commented 5 years ago

A couple of people have commented that it would be somehow more convenient to use SSZ at the Discovery layer. Right now I don't see any reason to switch off RLP, but I was curious anyway....does anyone agree with this notion and what is the motivation/rationale?

hwwhww commented 5 years ago
  1. SSZ has many better features in consensus layer, see: https://notes.ethereum.org/s/rkhCgQteN#SSZ and https://github.com/ethereum/eth2.0-specs/issues/582#issuecomment-461605143
  2. For the discovery layer and network layer, the question becomes what do we want to use for messaging when the underlayer is libp2p. Some implementers advocate using protobuf (https://github.com/ethereum/eth2.0-specs/issues/129, https://github.com/ethereum/eth2.0-specs/issues/503). Personally, I don't see convincing arguments of requirements of adding another serialization scheme that makes protocol stack more complicated.
raulk commented 5 years ago

2. For the discovery layer and network layer, the question becomes what do we want to use for messaging when the underlayer is libp2p. Some implementers advocate using protobuf (ethereum/eth2.0-specs#129, ethereum/eth2.0-specs#503). Personally, I don't see convincing arguments of requirements of adding another serialization scheme that makes protocol stack more complicated.

libp2p doesn't impose a particular serialisation format. It exposes plain byte-level readers and writers, so you are free to choose whichever wire format you prefer ;-)


EDIT P.S.: libp2p protocols like gossipsub, kademlia, etc. use length-delimited protobuf, but to the eyes of the rest of the libp2p stack, that's just an implementation detail.

FrankSzendzielarz commented 5 years ago

Most eth implementations already have RLP so is there any reason to add a further serialization format? I am told SSZ is rather bloated....with somewhat differing aims....

Mikerah commented 5 years ago

Most eth implementations already have RLP so is there any reason to add a further serialization format? I am told SSZ is rather bloated....with somewhat differing aims....

In Phase 0, I see keeping RLP as a short term solution so that the v5 nodes are backwards compatible with the v4 nodes. However, in the long term, this is not necessary. After all, ETH2.0 is a separate chain.

FrankSzendzielarz commented 5 years ago

OK...in that case what would you propose as a wire serialization format? SSZ? If so, why.

Mikerah commented 5 years ago

There's already a proposal for the Wire API that uses SSZ. There's has been a little bit of discussion lately about the design rational of SSZ and whether it should be changed. @karalabe suggested an alternative serialization scheme SOS (Simple Offset Serialization).

Perhaps a game plan that might be reasonable is the following:

Thoughts?

karalabe commented 5 years ago

Seems to me that both RLP and SSZ are kind of arguing that "hey, we have this existing code, lets use it instead of figuring out what the best solution is for this particular task".

The only meaningful way forward I see is to write up a list of requirements that the discovery wire format requires, and then we can pick a solution from there. My initial thoughts would be:

FrankSzendzielarz commented 5 years ago

Discv5 allows multiple messages to be sent as a message "stream" (each message contains a message M of N) if the response is expected to span MTUs. Eg FindNode -> Neighbors may result in multiple messages with lists of ENRs. This means some compression can be used in some places. But on the whole it is not of important utility.

Discv5 (currently) aims to be agnostic of if the transport is streamed or not. RLP does help with that in that it offers read look ahead hints.

On the whole I think it is now up to people to propose alternatives to RLP and say why. Right now it's RLP by default.

fjl commented 5 years ago

My perspective: we don't gain much from changing serialization formats for discovery. AFAIK SSZ was proposed because decoding RLP is annoying in the EVM. But those concerns with RLP in consensus layer don't apply to p2p because there is no need to process network packets inside the EVM.

RLP has advantages for networking because it is a free form format that can be decoded without a schema. It also allows forward-compatible encodings where we can say "just skip over this part, we'll define what goes here later".

The disadvantage of both RLP and SSZ is that they aren't "standard" encodings (i.e. they're not included in programming language standard libraries). RLP is widely supported though and has implementations in 15+ programming languages.

pipermerriam commented 5 years ago

I don't have the expertise to weigh in on the networking level components/reasons for choosing one over another.

I agree with Peter's assertion that we should have a list of things that we care about and make a decision using that as a framework. Here's a starting point.

arnetheduck commented 5 years ago

here's a few things I'm missing in a wire encoding for eth2:

one possibility is to have two levels of support - one being a subset of the other. the more strict version would be used for consensus whereas the other would be used for wire. of the standard "formats" I've seen in discussions, flatbuffers comes close. The advantage of doing the subset/superset approach is that it allows accessing the data without reencoding, and with a single decoder at that. it should be fairly easy to turn ssz into a subset of flatbuffers, it's very close already. this would solve the "standard tooling" question for anyone wanting to just consume the data (implementations can easily code up use custom encoders, while promoting easy consumption)

protobuf was discussed and discarded several times (in the eth2 repo / issues) for several reasons, including its poor support for the data types we often use, most notably hashes / fixed-length arrays, and poor encoding determinism.

pipermerriam commented 5 years ago

Just did a quick read of @karalabe 's SOS proposal

I'm only just starting to think about this so maybe someone else already knows. It seems like the following might be loosely mutually exclusive.

  1. streaming encodability/decodability
  2. O(log(N)) access times for arbitrary nested data

Alternatively, my understanding is that we're talking about wire serialization protocol. Can we not use one serialization scheme for wire transport and a completely different serialization scheme for hashing?

For wire we probably want: streaming, compact, first class support for our desired data types

For hashing we probably want: fast access times, compact, first class support for our desired data types

SOS seems to fit the bill for our hashing serialization needs. I'm not yet aware of a candidate for our wire needs.

FrankSzendzielarz commented 5 years ago

@arnetheduck I just realized I should edit the title. The question is aimed at working out if we need to change off RLP for Discovery "v5" and if so why. The premise is that Eth 2.0 will need to talk to Eth 1.X for quite some time anyway.

IMHO, the Discovery protocol is not just likely to change across versions in terms of message format, but also in terms of message exchange pattern. Clients implementing Discovery should consider the use of Strategy-like design-patterns, I think.

For Discovery v5 I am leaning towards the following scheme (though this is still a topic for discussion, and I will update this comment with a link to an issue on this) :

ENRs may assist with higher level protocols in a similar way.

pipermerriam commented 5 years ago

I did some research and I can't find anything that has the following three properties:

So I made one:

https://github.com/ethereum/bimini/blob/master/spec.md

I'd be curious to get some feedback on it. I'm working to get it to a point where I can provide some comparison numbers between it, RLP, and SSZ.

FrankSzendzielarz commented 5 years ago

@karalabe ^^^^ @pipermerriam FYI https://github.com/ethereum/eth2.0-specs/issues/692

pipermerriam commented 5 years ago

I've opened up this EIP with a more formal proposal for the SSS serialization scheme. It includes rational for why Protobuf, MessagePack, and CBOR are not suitable to our needs as well as a breakdown of RLP vs SSZ vs SSS on the various axis that I think a networking serialization scheme should be evaluated.

https://github.com/ethereum/EIPs/blob/71098b1c2760f2ae557a7bab91770eb8cf72fed5/EIPS/eip-sss_serialization.md

And did some very detailed analysis of SSS vs RLP vs SSZ which can be found here:

https://github.com/ethereum/bimini/blob/7c26efec585742ef870bf58ea5d96e2deb242775/report.md#sss-vs-rlp-summary

pipermerriam commented 5 years ago

Further evolution of this topic: https://github.com/ethereum/eth2.0-specs/issues/754

jannikluhn commented 5 years ago

The disadvantage of requiring a schema are becoming very apparent in the discussions on the wire protocol. With SSZ, whenever a node tries to decode a message they received, they need to know the schema already. As different message types contain different data, we need a single envelope schema, embed the body as a data blob, and deserialize it in two steps. We might even need multiple levels, e.g.:

Message {
    type: uint8
    serialized_body: bytes
}

GetHeadersResponse {
    id: uint64
    success: bool
    response_body: bytes
}

HeadersSuccessfulReponse {
    headers: []BlockHeader
}

HeadersFailedResponse {
    error_code: uint8
}

I don't really like this. With RLP, we would avoid this to some extent because we can deserialize everything in a single step, then walk through the different elements we got, and only update our interpretation of the data at every step. And, if the message structure contains information about the message type, we can even get rid of nesting (e.g. distinguish between HeadersSuccessfulResponse and HeadersFailedResponse depending on if it contains a list or not).

FrankSzendzielarz commented 5 years ago

Yes concerns have been raised by different people about SSZ on the wire, but regardless it is still included in the draft protocols there. For Discovery we're just going with RLP for now and upgrade mechanisms are simple once the ENRs are in place. The wire protocol for Eth 2.0 conflates message format with encoding/serialization. What wire formatter (media formatter in the web world) is used could easily be something established by rules in the ENR and/or handshake. If client implementers want to make a private network using BSON why should they not be able to?

pipermerriam commented 5 years ago

@jannikluhn after the last call I'm leaning towards defaulting to any/one-of-the minimal wire protocol proposals that were talked about which treat the Message part as raw bytes and delegate to a second layer of decoding to decode the actual message.

So no SSZ at the wire level but I still like the idea of using an SSZ variant of some sort fort he message component.