Open FrankSzendzielarz opened 5 years ago
2. For the discovery layer and network layer, the question becomes what do we want to use for messaging when the underlayer is libp2p. Some implementers advocate using protobuf (ethereum/eth2.0-specs#129, ethereum/eth2.0-specs#503). Personally, I don't see convincing arguments of requirements of adding another serialization scheme that makes protocol stack more complicated.
libp2p doesn't impose a particular serialisation format. It exposes plain byte-level readers and writers, so you are free to choose whichever wire format you prefer ;-)
EDIT P.S.: libp2p protocols like gossipsub, kademlia, etc. use length-delimited protobuf, but to the eyes of the rest of the libp2p stack, that's just an implementation detail.
Most eth implementations already have RLP so is there any reason to add a further serialization format? I am told SSZ is rather bloated....with somewhat differing aims....
Most eth implementations already have RLP so is there any reason to add a further serialization format? I am told SSZ is rather bloated....with somewhat differing aims....
In Phase 0, I see keeping RLP as a short term solution so that the v5 nodes are backwards compatible with the v4 nodes. However, in the long term, this is not necessary. After all, ETH2.0 is a separate chain.
OK...in that case what would you propose as a wire serialization format? SSZ? If so, why.
There's already a proposal for the Wire API that uses SSZ. There's has been a little bit of discussion lately about the design rational of SSZ and whether it should be changed. @karalabe suggested an alternative serialization scheme SOS (Simple Offset Serialization).
Perhaps a game plan that might be reasonable is the following:
Thoughts?
Seems to me that both RLP and SSZ are kind of arguing that "hey, we have this existing code, lets use it instead of figuring out what the best solution is for this particular task".
The only meaningful way forward I see is to write up a list of requirements that the discovery wire format requires, and then we can pick a solution from there. My initial thoughts would be:
Discv5 allows multiple messages to be sent as a message "stream" (each message contains a message M of N) if the response is expected to span MTUs. Eg FindNode -> Neighbors may result in multiple messages with lists of ENRs. This means some compression can be used in some places. But on the whole it is not of important utility.
Discv5 (currently) aims to be agnostic of if the transport is streamed or not. RLP does help with that in that it offers read look ahead hints.
On the whole I think it is now up to people to propose alternatives to RLP and say why. Right now it's RLP by default.
My perspective: we don't gain much from changing serialization formats for discovery. AFAIK SSZ was proposed because decoding RLP is annoying in the EVM. But those concerns with RLP in consensus layer don't apply to p2p because there is no need to process network packets inside the EVM.
RLP has advantages for networking because it is a free form format that can be decoded without a schema. It also allows forward-compatible encodings where we can say "just skip over this part, we'll define what goes here later".
The disadvantage of both RLP and SSZ is that they aren't "standard" encodings (i.e. they're not included in programming language standard libraries). RLP is widely supported though and has implementations in 15+ programming languages.
I don't have the expertise to weigh in on the networking level components/reasons for choosing one over another.
I agree with Peter's assertion that we should have a list of things that we care about and make a decision using that as a framework. Here's a starting point.
here's a few things I'm missing in a wire encoding for eth2:
one possibility is to have two levels of support - one being a subset of the other. the more strict version would be used for consensus whereas the other would be used for wire. of the standard "formats" I've seen in discussions, flatbuffers comes close. The advantage of doing the subset/superset approach is that it allows accessing the data without reencoding, and with a single decoder at that. it should be fairly easy to turn ssz into a subset of flatbuffers, it's very close already. this would solve the "standard tooling" question for anyone wanting to just consume the data (implementations can easily code up use custom encoders, while promoting easy consumption)
protobuf was discussed and discarded several times (in the eth2 repo / issues) for several reasons, including its poor support for the data types we often use, most notably hashes / fixed-length arrays, and poor encoding determinism.
Just did a quick read of @karalabe 's SOS proposal
I'm only just starting to think about this so maybe someone else already knows. It seems like the following might be loosely mutually exclusive.
O(log(N))
access times for arbitrary nested dataAlternatively, my understanding is that we're talking about wire serialization protocol. Can we not use one serialization scheme for wire transport and a completely different serialization scheme for hashing?
For wire we probably want: streaming, compact, first class support for our desired data types
For hashing we probably want: fast access times, compact, first class support for our desired data types
SOS seems to fit the bill for our hashing serialization needs. I'm not yet aware of a candidate for our wire needs.
@arnetheduck I just realized I should edit the title. The question is aimed at working out if we need to change off RLP for Discovery "v5" and if so why. The premise is that Eth 2.0 will need to talk to Eth 1.X for quite some time anyway.
IMHO, the Discovery protocol is not just likely to change across versions in terms of message format, but also in terms of message exchange pattern. Clients implementing Discovery should consider the use of Strategy-like design-patterns, I think.
For Discovery v5 I am leaning towards the following scheme (though this is still a topic for discussion, and I will update this comment with a link to an issue on this) :
ENRs may assist with higher level protocols in a similar way.
I did some research and I can't find anything that has the following three properties:
So I made one:
https://github.com/ethereum/bimini/blob/master/spec.md
I'd be curious to get some feedback on it. I'm working to get it to a point where I can provide some comparison numbers between it, RLP, and SSZ.
@karalabe ^^^^ @pipermerriam FYI https://github.com/ethereum/eth2.0-specs/issues/692
I've opened up this EIP with a more formal proposal for the SSS serialization scheme. It includes rational for why Protobuf, MessagePack, and CBOR are not suitable to our needs as well as a breakdown of RLP vs SSZ vs SSS on the various axis that I think a networking serialization scheme should be evaluated.
And did some very detailed analysis of SSS vs RLP vs SSZ which can be found here:
Further evolution of this topic: https://github.com/ethereum/eth2.0-specs/issues/754
The disadvantage of requiring a schema are becoming very apparent in the discussions on the wire protocol. With SSZ, whenever a node tries to decode a message they received, they need to know the schema already. As different message types contain different data, we need a single envelope schema, embed the body as a data blob, and deserialize it in two steps. We might even need multiple levels, e.g.:
Message {
type: uint8
serialized_body: bytes
}
GetHeadersResponse {
id: uint64
success: bool
response_body: bytes
}
HeadersSuccessfulReponse {
headers: []BlockHeader
}
HeadersFailedResponse {
error_code: uint8
}
I don't really like this. With RLP, we would avoid this to some extent because we can deserialize everything in a single step, then walk through the different elements we got, and only update our interpretation of the data at every step. And, if the message structure contains information about the message type, we can even get rid of nesting (e.g. distinguish between HeadersSuccessfulResponse
and HeadersFailedResponse
depending on if it contains a list or not).
Yes concerns have been raised by different people about SSZ on the wire, but regardless it is still included in the draft protocols there. For Discovery we're just going with RLP for now and upgrade mechanisms are simple once the ENRs are in place. The wire protocol for Eth 2.0 conflates message format with encoding/serialization. What wire formatter (media formatter in the web world) is used could easily be something established by rules in the ENR and/or handshake. If client implementers want to make a private network using BSON why should they not be able to?
@jannikluhn after the last call I'm leaning towards defaulting to any/one-of-the minimal wire protocol proposals that were talked about which treat the Message part as raw bytes and delegate to a second layer of decoding to decode the actual message.
So no SSZ at the wire level but I still like the idea of using an SSZ variant of some sort fort he message component.
A couple of people have commented that it would be somehow more convenient to use SSZ at the Discovery layer. Right now I don't see any reason to switch off RLP, but I was curious anyway....does anyone agree with this notion and what is the motivation/rationale?