ethereum / consensus-specs

Ethereum Proof-of-Stake Consensus Specifications
Creative Commons Zero v1.0 Universal
3.58k stars 977 forks source link

Snappy frames delimiter wanted #1819

Closed Nashatyrev closed 4 years ago

Nashatyrev commented 4 years ago

Currently it is a bit tricky from the implementation point of view to uncompress snappy frames in the response stream. For example we get a response with several chunks:

| result-1 | sszLength-1 | snappyFrame-1-1 | snappyFrame-1-2 | result-2 | sszLength-2 | snappyFrame-2-1 | ...

Without hacking into the Snappy frame format (Snappy frames has length prefix) we can't figure out how many bytes we should take from the stream to feed to the Snappy decompressor.

May be it makes sense to length-prefix every Snappy frame explicitly to avoid error-prone implementation tricks?

protolambda commented 4 years ago

Your approach to consume the snappy frames can be changed to make it work. Other clients do this, even in javascript, and it can work fine, without API changes to a proper snappy library.

You read the SSZ length, then read that many bytes from the Snappy reader (assuming you've snappy framed format exposed as with a streaming API). And snappy should only read more frames when more SSZ bytes are requested than available from the currently buffered frame. More frames could be buffered before returning the decompressed bytes in a stream-read call, if the requested ssz byte length is big enough to warrant that. Then just loop the read call until you get the full N ssz bytes, and at the same time you can decode them into the SSZ datastructures. SSZ isn't technically a streaming format, but since the SSZ length is already known, a decoder can build the datastructure without buffering the full bytes first!

Similarly, for encoding you can compute the size efficiently first, then write to the (snappy compressed) stream, without buffering the full ssz bytes. And snappy writes the completed frames to the network output, while maintaining a single frame. And after encoding the ssz object to the stream, you should then flush it to complete the last frame (snappy compressor could still be waiting for more bytes to put in the last frame, hence the need to flush before closing it), and you have a full streaming (as good as possible here) compressed SSZ protocol!

Nashatyrev commented 4 years ago

Your approach to consume the snappy frames can be changed to make it work. Other clients do this, even in javascript, and it can work fine, without API changes to a proper snappy library.

I didn't mean it's impossible but it requires some extra efforts from the client implementation and requires some specific capabilities from Snappy library. This seems to me like unnecessary over-complexity and can be implementation error-prone

Let me dive a bit deeper with the assumption that the whole response can be fragmented in any arbitrary fashion.

Let's for example suppose the following response (2 single frame chunks)

| result-1 | sszLength-1 | snappyFrame-1 | result-2 | sszLength-2 | snappyFrame-2 |

and it is fragmented and delivered with 2 packets (ByteFuffers or whatever) :

( result-1 | sszLength-1 | snappyFr)  // packet #1
(ame-1 | result-2 | sszLength-2 | snappyFrame-2 )  // packet #2

(note: you can't reconstruct Eth2 response chunks from these fragments without hacking into Snappy frames format)

You read the SSZ length, then read that many bytes from the Snappy reader (assuming you've snappy framed format exposed as with a streaming API)

Assuming you are feeding the packet #1 to an abstract Snappy decompressor and call read(ssLength_1) There are different possible options which decompressor may or may not support:

Also when feeding the packet #2 to the decompressor we need to make sure it doesn't pre-fetch any bytes internally and would read the compressed stream (ByteBuffer or whatever ) exactly until | result-2

Another strong argument is that we would like to process Eth2 chunks (either with Snappy compression or not) at a higher level without dealing with packet fragmentation issues.

Just checked the Lighthouse code https://github.com/sigp/lighthouse/blob/master/beacon_node/eth2-libp2p/src/rpc/codec/ssz_snappy.rs#L260-L262 and I'm not sure if it would work in case of fragmentation described above. I'm suspecting other libraries may potentially have issues in case of fragmentation and just work fine cause it's not happening with the current Libp2p muxer implementation

Again I'm not saying it's impossible to implement the current protocol but why do we need to impose those complexities when we are able to simplify the things?

protolambda commented 4 years ago

(note: you can't reconstruct Eth2 response chunks from these fragments without hacking into Snappy frames format

If it's split like that, and you can't read the second part, then you can't reconstruct a response chunk regardless. Not a problem specific to snappy.

Also when feeding the packet #2 to the decompressor we need to make sure it doesn't pre-fetch any bytes internally and would read the compressed stream (ByteBuffer or whatever ) exactly until | result-2

I think that is just a design problem of the implementation. It should not be "feeding", the decompression should ask for the data from the feeder. Otherwise you have to understand the contents (expected ssz byte length) on the producer side, instead of the consumer side. The "feeding" is prone to pre-fetching indeed if you have no length information, and just a bad idea, since reading bytes from a stream that is not supposed to return them is unexpected bad behavior. I would keep it simple and memory efficient: have the snappy library do the work, and ask for the bytes it needs, not the bytes you guess it could need.

Also, if we start encoding the full compressed length we can't do streaming-encoding anymore, as we have to compute it ahead of time. That would mean compressing everything first, and if an RPC chunk is very large that can be a problem.

Just checked the Lighthouse code

It should work fine, was tested with rumor first, then plenty of use in production (schlesi). And soon network tests will help too, but I think this part of lighthouse RPC is good. However, what lighthouse does not yet do is to decode the bytes directly into a data structure. Instead, it buffers the full ssz-length worth of bytes, and only then encodes it. No problem, that can be improved later on their side. The idea really is that if we had bigger RPC chunks (E.g. beacon states, or later phase 1 things), we can support them with the current RPC without problems.

Again I'm not saying it's impossible to implement the current protocol but why do we need to impose those complexities when we are able to simplify the things?

Yes, I understand it's not easy, but I do think it has its benefits. We reduce memory (so far a bigger problem in clients than CPU or network io), support large chunks, enable better compression within a chunk (can have multiple frame types), and streaming in both encoding and decoding (except for length computation, but that can be very cheap).

Nashatyrev commented 4 years ago

I think that is just a design problem of the implementation. It should not be "feeding", the decompression should ask for the data from the feeder. Otherwise you have to understand the contents (expected ssz byte length) on the producer side, instead of the consumer side. The "feeding" is prone to pre-fetching indeed if you have no length information, and just a bad idea, since reading bytes from a stream that is not supposed to return them is unexpected bad behavior. I would keep it simple and memory efficient: have the snappy library do the work, and ask for the bytes it needs, not the bytes you guess it could need.

Ok there was probably terms misunderstanding...

Let's see to the lighthouse code in more details to demonstrate my concerns:

Suppose we have two packets (as described above) where Snappy frame is fragmented

When the packet #1 arrives I suppose it is immediately passed to this method: fn decode(&mut self, src: &mut BytesMut) I'm not sure this code would process it correctly cause the Snappy frame is incomplete: reader.read_exact(&mut decoded_buffer)

I think it might be good idea to invite someone from Lighthouse team for comments.

Also, if we start encoding the full compressed length we can't do streaming-encoding anymore, as we have to compute it ahead of time. That would mean compressing everything first, and if an RPC chunk is very large that can be a problem.

That's why I would suggest to length-prefix every Snappy frame with the length of compressed frame. This should not affect Snappy streaming ability but should allow handling wire packets more explicitly

pawanjay176 commented 4 years ago

So the way it works in Lighthouse for your example is that the read_exact function will read exactly the number of bytes from the underlying snappy reader to fill the decoded_buffer with sszLength-1 bytes. If there aren't enough bytes like in your example, we return None which buffers the received bytes in expectation of more bytes to come.

After we receive packet #2, we now have enough bytes to fill the decoded_buffer, so read_exact returns the decompressed bytes which we ssz decode to return the decoded chunk. The second frame would be straightforward now since we have all the bytes needed to decode it.

Nashatyrev commented 4 years ago

Well I completed non-blocking implementation in Teku, and I can say that was somewhat challenging. I needed to carefully handle possible packets fragmentations and sticking through the whole application level protocol decoders pipeline. That was kind of building an application protocol right on top of IP protocol. Probably blocking streams and coroutines would make the life much easier but we are lacking this in Java world at the moment.

Just thought that protocol compression ideally should be somewhat transparent to the upper levels. I.e. why not compress the whole stream of RPC chunks, so the lower network lower level handler may decompress it without knowledge of higher level message structure.

Anyways everyone I believe has already implemented this with more or less pain and revising this protocol has no much sense now. But for future changes and future protocols I would suggest to consider this implementation side issue

protolambda commented 4 years ago

@Nashatyrev Sorry to hear it was a pain, that was not the intention. With the right abstractions to read/write to streams it was doable from a Golang perspective at least, and I tried to get feedback from Lighthouse, Nimbus and Prysm. Other clients seemed too busy with other things at the time. The change to support snappy was discussed and generally accepted on the network call and in PR #1606 (which was open for at least a month), and so it went through. If you like to suggest things for easier implementation in Java you're welcome to, but I think we'll keep this as-is for now, and then check future changes more closely against the Java way of doing things.