Provide first-class support for SSZ

ethereum / beacon-APIs

Collection of RESTful APIs provided by Ethereum Beacon nodes

https://ethereum.github.io/beacon-APIs/

Creative Commons Zero v1.0 Universal

334 stars 169 forks source link

Provide first-class support for SSZ #250

Open mcdee opened 1 year ago

mcdee commented 1 year ago

At current the beacon API enshrines JSON as the only first-class encoding for sending and receiving data. I propose that we provide SSZ as an alternative encoding for all endpoints.

What is SSZ?

Simple Serialize (SSZ) is an encoding used for transferring data across the Ethereum P2P network. It is also used in a number of places for data storage. I won't go in to all of its benefits here, but the ones that we care about most for the purposes of the API are:

significantly smaller footprint for the same data
much faster encoding and decoding, generally using references to the received data without transforms

How would this work?

Given that the beacon APIs are already present and used there needs to be a way for clients and servers to work together as SSZ support is rolled out over time. The proposal is:

existing beacon API servers must at a minimum support the Accept header with multiple entries and quality values (e.g. Accept: application/octet-stream;q=1.0,application/json;q=0.9). Note that this appears to be the case for current beacon API server implementations, in that they at least will currently return JSON data when presented with this header
servers send the Content-type header with all data responses
servers add support for the '415' error code when none of the options in the Accept header can be provided
servers add support for accepting all inputs and delivering all responses using SSZ

Regarding well-behaved clients:

when a client first connects to a beacon API server it queries a simple endpoint (e.g. /eth/v1/node/version) with the header Accept: application/octet-stream;q=1.0,application/json;q=0.9
- if the beacon API server supports SSZ it responds with an SSZ object and the header Content-type: application/octet-stream. Subsequent requests can be sent with Accept: application/octet-stream (or continue to use the multi-option Accept header if desired)
- if the beacon API server does not support SSZ it responds with a JSON object (it may or may not respond with a Content-type header, but it is not required to allow support of existing implementations). Subsequent requests can be sent with Accept: application/json (or continue to use the multi-option Accept header if desired)

It is possible for a client to only want a single content type by specifying Accept: application/octet-stream. A server that supports the '415' error code but does not support SSZ should return the 415 error. A server that does not support the '415' error code will return JSON (again, all current servers appear to do this) so the client should be able to handle this situation.

Note that the first three requirements for the servers are very low cost and provide a basis on which to build SSZ support in a clean fashion as far as clients are concerned. They are also all expected behavior for a well-configured REST API server. As such, it would be recommended that these items are implemented separately and ASAP before any work on supporting SSZ even starts.

arnetheduck commented 1 year ago

Before moving in this direction, I'd like to see https://github.com/ethereum/consensus-specs/pull/2983 adopted - the key point here is to maintain a 1:1 mapping between the basic SSZ types (ie no new JSON types for message/fork-specific aliases but rather a finite set of JSON core types), as they exist in simple_serialize.md alone - there's a discussion in that PR whether it''s meaningful to add a byte type, with the short of the story being that it's meaningful for some humans when they read JSON/specs but not for SSZ itself because it has the same encoding - it's defined as an alias right now where some consider aliases to be type-introducing and some don't.

The reason why the canonical 1:1 mapping is important is the implementation effort needed to deal with a jungle of REST-specific encodings and inventions which make the overhead of supporting both JSON and SSZ higher than it has to be - with a canonical mapping, it becomes a trivial matter because all things that can be expressed in one format can also, without additional specification, be supported in the other. Mostly, this means constraining JSON to the basic data types in simple_serialize, which shouldn't be a problem (protobuf for example defines such a mapping allowing 100% interop between protobuf and json - we should also).

mcdee commented 1 year ago

Please could we not pollute this issue with the on-going and contentious debate there? The adoption or not of that PR will make no difference to the changes put forward here, as this proposal makes no claim to define either JSON or SSZ encodings.

arnetheduck commented 1 year ago

not pollute this issue

This PR represents a significant implementation effort as well as testing surface area - it is highly relevant for its acceptance that we have a future-proof plan for supporting it over time without incurring a high burden on client (well, server) developers - JSON is not going to go away meaning that the advantage of SSZ is mainly to save a few cycles here and there (are there benchmarks?) on JSON vs binary - given how the execution API is still JSON-RPC and hasn't hindered adoption, the performance difference is not a major problem - not now, and not in the foreseeable future - it's a nice-to-have - and thus, it stands to reason that maintenance concerns are addressed first - the good news being that some progress has been made at devcon that has yet to make it to that PR.

mcdee commented 1 year ago

The current call to /eth/v1/beacon/states/head/validators returns over 200MB of data, and state is similar. As far as I am aware there is nothing even close to this that would be returned by the JSON-RPC spec, and certainly nothing that would be considered as sitting in the critical path for building a block.

...the advantage of SSZ is mainly to save a few cycles here and there (are there benchmarks?)

This has been known as an issue for a while, with piecemeal adoption of SSZ in some calls as an attempt to alleviate it, but here is an example of unmarshalling the current mainnet state from JSON (199,713,061 bytes) and SSZ (65,731,170 bytes):

BenchmarkJSON-8           2 7106709348 ns/op    644294488 B/op  7378019 allocs/op
BenchmarkSSZ-8           79  79120604 ns/op 82651195 B/op    989866 allocs/op

The SSZ unmarshalling takes 1% of the time, and around 13% of the memory.

Validators, state and blocks are large and getting larger. The additional time taken for validator clients to do their work increases the time taken for blocks to be proposed, which has a knock-on effect across the network. This is exacerbated with the additional time taken with MEV relays.

Could this be implemented on a per-call basis rather than across the entire set of responses? Yes it could, but the framework for doing so (bullets 1-3 on the server side) would be necessary to do this anyway. If it's a concern then we can state that SSZ may be available, and that clients should use Accept with multiple options and q values for all requests and checking of the Content-type return header for all responses to decode the response accordingly.

rolfyone commented 1 year ago

A lot of our response objects are basically from SSZ spec objects, and these I see being relatively simple with a caveat...

Are we talking about basically if SSZ is returned, that the data field encoded as SSZ is returned?

Certain API's have issues... I'm tempted to constrain scope of this to something like 'validator required' apis, so that it's more manageable...

Even more preferred would be specific use cases, rather than an open intent - eg. lets do xxx because it is 90% of the calls we make...

Also, depending on how this maps out, potentially we need to fix the blocks api which just returns the data...

A specific example of an endpoint I'm concerned of is the validators list which is

huge
not actually an SSZ object
seems to be a constant issue I may actually raise an issue if i haven't already to discuss the existence of that api further, rather than discuss here too much, but its a good example of extra overheads being needed to implement across the board.

elee1766 commented 1 year ago

my 2 cents:

i dont know any clients that read data structures directly from ssz bytes, so since the response payload will probably be decoded in whole, i don't really see the advantage of switching the format.

on the topics of validators endpoint:

it seems the validators list is about 35mb compressed, which honestly isnt' bad at all. i thought http and grpc both have transparent compression specifications. perhaps those could just be enabled for the case of remote nodes talking to eachother?
i am pretty certain that json blob can be parsed much much faster. I will do some experiments later. 7 seconds on a modern computer to parse 200mb of json seems wrong...

mcdee commented 1 year ago

Are we talking about basically if SSZ is returned, that the data field encoded as SSZ is returned?

Yes.

Even more preferred would be specific use cases, rather than an open intent

If the additional work to provide this for each additional API is linear or near-linear then fair enough, although it would mean that there would be no ability to have an SSZ-only client. But that's not a particular requirement right now.

If we were to narrow it down to "endpoints that are sent or return large amounts of data, or large enough amounts of data and are in the validating path" then that would reduce the scope significantly.

mcdee commented 1 year ago

[...] it seems the validators list is about 35mb compressed

None of the beacon API servers support compression today. But even if they did this would likely increase both memory and CPU to decompress the data.

arnetheduck commented 1 year ago

not actually an SSZ object

The aim of basing this proposal on the canonical JSON mapping is exactly to constrain all our usage of JSON to that which has an direct and unambiguous SSZ equivalent (and vice versa) - at that point, it's a trivial matter to support any API in both JSON and SSZ - both from a specification and from an implementation point of view.

The SSZ unmarshalling takes 1% of the time

This smells of an unusually bad JSON decoder or a bug in the benchmark - I'd expect something like 3-5x difference for a decent one (even 10x for a poor one .. but 100x?) - ditto memory usage - what is going on in there?

ajsutton commented 1 year ago

I'm not a real fan of supporting SSZ for everything. There are cases like returning blocks and states where it's useful to be able to have the response in SSZ format (eg for checkpoint sync or replaying blocks) but I struggle to see why SSZ would be useful for endpoints like /eth/v1/beacon/blocks/{block_id}/root for example.

Where we do return SSZ it often includes less data because it doesn't include the fields outside the data element which is fine for the cases where SSZ is useful but means it isn't a complete replacement for the API.

In terms of performance, I could see it potentially being meaningful for large structures like the state but that's already available as SSZ and isn't a performance critical API. Smaller pieces of data are unlikely to show any noticeable benefit of using JSON over SSZ (notably JSON is often used for price feeds in traditional finance which have far tighter performance requirements - there are binary standards for the real extreme end but you can push JSON performance an awfully long way).

when a client first connects to a beacon API server it queries a simple endpoint (e.g. /eth/v1/node/version) with the header Accept: application/octet-stream;q=1.0,application/json;q=0.9

I'm unclear why a client would do this explicit querying and it makes two invalid assumptions:

That if SSZ is supported for one api that it's supported for all APIs
That the SSZ support won't change over time - the server may be changed while the client is still running which changes the assumption or a load balancer may be spreading requests over multiple endpoints with different capabilities.

Clients should just send an Accept header that allows both SSZ and JSON with each request then deal with the response based on the Content-Type.

It is possible for a client to only want a single content type by specifying Accept: application/octet-stream. A server that supports the '415' error code but does not support SSZ should return the 415 error. A server that does not support the '415' error code will return JSON (again, all current servers appear to do this) so the client should be able to handle this situation.

Teku explicitly does NOT do this and will return the default representation (JSON) when no Accept matches. This is the most common behaviour for servers on the internet because returning 415 tends to cause all kinds of surprising compatibility issues with different tools setting Accept in quite unexpected ways. If the client doesn't want to accept application/json it can easily ignore the response based on the Content-Type header in the response.

For data size, it should be simple to add support for gzip encoding in Teku and in my experience it doesn't add any additional delay - there's a small CPU cost but it wouldn't be noticeable unless you're serving a huge number of concurrent requests (e.g you're running something bigger than Infura).

arnetheduck commented 1 year ago

but I struggle to see why SSZ would be useful

I think the strongest case for this is that one can create SSZ-only clients that do not have to implement JSON at all, decreasing their surface and simplifying their code - for non-human use cases, JSON is quite terrible and I can certainly see the allure of avoiding it.

compression

Compression is orthogonal I think - ie even SSZ compresses to about half (based on ERA file compression which compresses SSZ blocks and states using snappy).

That said, adding compression to JSON makes the CPU side of performance worse in the sense that it'll take longer to both decompress and parse, most likely - if someone is using a JSON decoder with as poor performance as is suggested in the given benchmark, they will likely not be made happier by having to decompress as well - ditto on the server side: producing compressed JSON obviously takes more juice than non-compressed - the question becomes whether this is offset by the need to stream less data or not, which probably is hard to answer generally (it'll depend on network speed and other factors).

I've thought about Content-Type: application/x-snappy-framed which would return compressed SSZ - this is trivial to add for all clients because they already have to support this format for libp2p requests (we already store blocks in compressed like this to avoid recompressing) - gzip is arguably more web-standard, but adds surface.

I agree that performance alone poorly motivates the significant surface that API providers have to implement .