ethereum / consensus-specs

Ethereum Proof-of-Stake Consensus Specifications
Creative Commons Zero v1.0 Universal
3.56k stars 971 forks source link

BeaconNode <--> ValidatorClient API - Protocol #1012

Closed spble closed 5 years ago

spble commented 5 years ago

ETH2.0 Beacon Node & Validator Client Protocol Discussion

Further background, and actual protocol, is described in issue #1011

It would be useful to choose a standard protocol for the BeaconNode and ValidatorClient API.

It was discussed during the Client Architecture session at the Sydney Implementers meeting that the main decision is between gRPC and JSON-RPC. This discussion was a follow on from the Client Architecture Roundtable in Prague.

gRPC

Advantages

Disadvantages

JSON-RPC

Advantages

Disadvantages

In conclusion, most people had a preference towards JSON-RPC mainly due to it's human readability and ease of implementation.

prestonvanloon commented 5 years ago

For Prysm, we will be continuing to use protocol buffers for our beacon chain and validator implementation. The discussion within the team is that the API enforcement within generative code and performance gains outweigh the marginal benefit of using curl or other pre-installed tools rather than tools created for the ecosystem.

Client interop may be achieved through a gRPC proxy gateway, but the bidirectional streaming would not work so we may not support JSON-RPC unless there is a very compelling reason to do so.

spble commented 5 years ago

Thanks for the input @prestonvanloon - I definitely see the performance improvements with using gRPC, but I imagine the interface between the BeaconNode and ValidatorClient will be a fairly low-bandwidth interface. As such, doing a call in 10ms instead of 100ms would not bring any substantial benefit in my opinion.

We have also implemented protocol buffers in Lighthouse currently, but we are considering re-factoring this if most other clients are in favour of JSON-RPC. Interoperability is our most compelling reason for this re-factor.

While using curl is helpful, I think the human readable and widely understood nature of the protocol is the biggest benefit. Interacting with JSON-RPC is very very widely used and understood by web developers, whereas gRPC is generally a lot more niche.

spble commented 5 years ago

Also, a quick google around reveals: https://github.com/plutov/benchmark-grpc-protobuf-vs-http-json Turns out that speeds and resource usages are fairly compatible... within one order of magnitude anyway.

prestonvanloon commented 5 years ago

@spble Interesting link!

They are almost the same, HTTP+JSON is a bit faster and has less allocs/op.

This is quite surprising actually. 😄

Going forward, we would still advocate for protobuf usage even if solely for its generative schema approach. If the general consensus is to support only JSON-RPC, then we would likely provide some wrapper or use jsonpb while we continue to have success with protobuf. We're even using protobuf in the browser with a Typescript application! And with tools like prototool, we maintain productivity for the rare need for adhoc queries.

In short, we support interop even if we are the minority.

pipermerriam commented 5 years ago

Lacking a compelling reason for the performance gains gRPC which based on the comments from this thread doesn't seem to be present, JSON-RPC is my preference.

Potentially compelling reason for JSON-RPC: It is already well supported across the existing web3 tooling which makes integration with existing web3 client libraries much simpler.

karalabe commented 5 years ago

Hey all,

Just wanted to do a small braindump. Full disclosure, I'm not familiar with the ETH 2.0 spec at all, neither with the communication requirements between beacon chain nodes and validators. That said, I can talk out of ETH 1.0 experience + general API experience.

Generally, the dumber and more boring a protocol is, the simpler it is to interface. At the end of the day, the goal of Ethereum is to bring developers together, so we should always prefer simplicity over other advantages.

There have been two proposals made here: gRPC and JSON-RPC. I honestly don't see any advantage in gRPC if we're building an open infrastructure. "Nobody" will want to (or be able to) roll their own gRPC implementation, so you are immediately limited by what you can implement on top of Ethereum purely, because you can't talk to it. This alone should be enough to rule out gRPC (this is why you don't see protobuf, cap'n proto and others on APIs). These frameworks are very nice for internal calls in proprietary systems, but not in public APIs that you want to maximize interoperability with.

That said, JSON-RPC is also a horrible choice. It's better than gRPC in that you can at least interface it easily, but the issue is that it is a stateful protocol, which makes it a non-composable protocol. Ethereum 1.0 made the huge mistake of permitting RPC calls that span requests (e.g. req1: create a filter; req2: query logs until block N; req3: query until block M, etc). This is a huge problem in internet infrastructure as it completely breaks horizontal scaling. All the requests must go to the same backend, because they are stateful. The backend cannot be restarted, the backend cannot be scaled, cannot be load balanced, cannot be updated, etc. JSON RPC works ok for communicating 1-1 with a dedicated node, but you cannot build public infrastructure out of it.

My proposal is to seriously consider RESTful HTTP for all public APIs that Ethereum nodes need to serve. If you are unfamiliar with it, REST is simply a "schema" that defines how you should query data ("GET /path/to/resource/"), how you should upload data ("POST /path/to/resource") and how different errors should be conveyed ("404 not found"). It is a tiny specialization of the HTTP protocol, but the enormous power is that:

You see, RESTful HTTP APIs are the building blocks of the modern internet. Everything on the internet is built to produce and consume it. If we go down the JSON RPC path, we remain yet another niche. Sure, some will support it, but the big guys will always be deterred. If we embrace proper architectures, Ethereum will be trivial to integrate into existing systems, giving it a huge boost in developer appeal.

My 2c.

karalabe commented 5 years ago

Oh, just as a memo, the fact that the default reply format is JSON, is just a detail. Since the reply is just an HTTP response, you are free to send JSON, or any other format. Way back XML was also popular (e.g. "GET /path/to/res.json" vs. "GET /path/to/res.xml"), but there's nothing stopping us from also supporting a binary format (e.g. "GET /path/to/res.rlp" or "GET /path/to/res.ssz"). REST still works, it doesn't care, HTTP doesn't care, nothing cares. But we can immediately have both performance and simplicity: validators would use a binary format, and a web interface would use a json format.

karalabe commented 5 years ago

Btw, I'd gladly help spec out a REST version if you have any pointers to the requirements. I'm aware there might be limitations that might make REST unsuitable, but I'd rather redesign the limitations than go with a non-popular protocol.

pipermerriam commented 5 years ago

@karalabe I'm not sure I follow the argument for REST. I acknowledge and recognize the problems with the stateful ETH1.x APIs and am fully onboard with avoiding those mistakes in Eth2.0 APIs but I fail to see how REST solves that.

Note that I'm not arguing against REST, just trying to understand.

I do agree that REST is more expressive than JSON-RPC and that we could benefit from that. I will say that JSON-RPC's simplicity has been nice, exposing the API over a unix socket and bypass the need for an HTTP server's complexity.

ligi commented 5 years ago

I think the most compelling argument for gRPC/protobuf is that it leads to a well defined API - currently with json-rpc this is a mess. As far as I see preventing to repeat this mess could be enforced by using gRPC/protobuf. So I would lean in this direction. Also having trouble understanding @karalabe 's argument against it:

I honestly don't see any advantage in gRPC if we're building an open infrastructure. "Nobody" will want to (or be able to) roll their own gRPC implementation, so you are immediately limited by what you can implement on top of Ethereum purely, because you can't talk to it.

why will nobody be able to roll their own gRPC implementation?

karalabe commented 5 years ago

REST mostly allows Ethereum to be a component in a modern web stack. For example, I can't run my own "Infura", because it's a PITA to interpret, load balance, and cache all those requests. It takes a team just to maintain an Ethereum gateway. But if the API was simple REST, anyone could compete with Infura. You could have Cloudflare compete with them. You could launch N k8s instances and have k8s auto load balance. The advantage is that you can combine your node with existing infrastructure in a natural and native way, without relying on custom bridges (e.g. How do I write a firewall to block personal_xyz JSON RPC calls, I dunno? How do I write a firewall to block /api/personal/xyz, well, that's easy, any web server/router/proxy can do it, or authenticate it, or throttle it).

I do agree that REST is more expressive than JSON-RPC and that we could benefit from that.

I'd actually say REST is less expressive, hence why it's more composable.

exposing the API over a unix socket and bypass the need for an HTTP server's complexity.

We can still expose REST through a unix socket. The socket is just a transport, TCP vs. IPC. Whether that transport speaks REST or JSON-RPC is not relevant from the transport's perspective.

karalabe commented 5 years ago

@ligi gRPC is a framework. You need libraries to speak to it. e.g. there's no Erlang lib. You immediately shot people like blockscout off the network who develop in Elixir.

ligi commented 5 years ago

hm: https://github.com/elixir-grpc/grpc

karalabe commented 5 years ago

0.4-alpha, build failed on CI :) Yes, you can hack it, but that doesn't mean there's reliable code.

ligi commented 5 years ago

OK - good point ;-) Still really compelled by the advantage of having a strong protocol spec though - but you are right - it comes with some collateral damage ..

karalabe commented 5 years ago

Completely agree :) https://swagger.io/specification/

pipermerriam commented 5 years ago

@karalabe :+1: makes sense now. I would be fine with REST or JSON-RPC.

Restating my :-1: on grpc due to it having real tooling downsides and all of it's stated upsides being things we can address with things like swagger for well defined REST specifications, or just good due diligence if it's JSON-RPC.

My comment about expressiveness was intended towards the expressiveness of HTTP method semantics in REST (GET/POST/PUT/DELETE) and response status code.

holiman commented 5 years ago

My two cents (which mainly is the same as @karalabe brought up).

Cent one

That may be somewhat generalizing, but I think it's fairly accurate description. So, also without having deep insight into 2.0, I think you should consider whether what we're building up to is going to be a dialogue or a client/server scenario.

Cent two

FrankSzendzielarz commented 5 years ago

My Various Cents

FrankSzendzielarz commented 5 years ago

Here is a rough, part-implemented (missing other objects deeper in the object graph under BeaconBlock) example of a Beacon node HTTP REST-like architecture and API

https://beaconapi20190506111547.azurewebsites.net/

Because the Swagger metadata in the URL is downloadable this could help serve as a spec.

I can keep extending and modifying this so that it actually does validation etc., if people want. Maybe it could evolve into a test harness or an implementation. Let me know please.

You can add proto-buf media formatters and RLP formatters as well those default JSON and XML ones you see there. You can also try to auto-generate clients in the language of your choice here with the gen/clients POST method. Eg: this was auto gen'd for golang and note the docs folder. go-client-generated.zip Rust, just to be fair: rust-client-generated.zip

spble commented 5 years ago

Thanks very much for the input @karalabe - I definitely agree with your points regarding REST. I think HTTP-REST is what I had in my mind, I was just following Eth1.0 with JSON-RPC.

My vote is definitely for a HTTP REST interface, which returns JSON by default.

Thank you for the part implementation @FrankSzendzielarz - I will integrate your suggestions into my next API proposal and post it on #1011

paulhauner commented 5 years ago

My proposal is to seriously consider RESTful HTTP for all public APIs that Ethereum nodes need to serve.

I support this.

arnetheduck commented 5 years ago

My proposal is to seriously consider RESTful HTTP for all public APIs that Ethereum nodes need to serve

likewise, support this, for the advantages of working better with "standard" infrastructure. also good to work on specifying it unambiguously - current status quo is indeed a bit of mess to figure out, and swagger seems as good as any.

nothing prevents clients from using another, more performant or specialized protocol in their internal communication (for example when a beacon node and validator from the same client suite talks to each other), when the goal is not interop.

gcolvin commented 5 years ago

@karalabe Has been right about this for going on two decades now. Roy Thomas Fielding, Architectural Styles and the Design of Network-based Software Architectures CHAPTER 5: Representational State Transfer (REST)

spble commented 5 years ago

So a REST API seems to be the consensus.

I have proposed an OpenAPI spec in PR #1069, which can also be viewed on SwaggerHub

Closing this issue in favour of the PR.

BelfordZ commented 5 years ago

I propose we use OpenRPC + JSON-RPC

zcstarr commented 5 years ago

I don't understand, the arguments made above seem anti useful. It seems that this change would be locking you into a transport, that has high levels of inefficiency. With JSON-Rpc you have a choice about how to the data gets there. If this is meant to be low level infrastructure, being as agnostic as possible with the transport seems the most beneficial.

When it comes to tooling, JSON-Schema specifications are your friend, they have always been. Additionally has anyone used swagger, and swagger tooling in production. Just because swagger exist doesn't actually solve your problems of testability and documentation discoverability. OpenAPI may support code generation, but there's the possibility of doing the same for JSON-RPC.

The issues are many fold.

  1. A switch like this breaks all the tooling people have made around JSON-Rpc
  2. With REST there's no ability to batch request, there of course are work arounds, but its not particularly useful.
  3. You've just locked your infrastructure into http , with 0 support for potentially faster transports
  4. Using swagger tooling isn't great, the ecosystem for swagger, isn't as well maintained as you'd think
  5. Why is the juice worth the squeeze, weaker performance, transport lockin, breaking any ecosystem/tooling that expects to communicate over JSON-Rpc.

If someone could layout the benefits outside of swagger docs that would be amazing, generative resources are good thing, but there's other tooling to generate JSON-Rpc clients/servers as well.

This is a really important issue, can't believe I missed this coming down the pipe.

holiman commented 5 years ago

@zcstarr I can only reiterate, really. json-rpc is awesome for two peers having a dialogue. In the recent development of clef, the externalized signer from geth, we use bi-directional json-rpc between clef (daemon) and the ui. Both parties can send messages to the other, and get responses - have asyncronous dialogues.

However, REST assumes that the communication is a client requesting resources form a server -- they are not peers.

The two models are inherently different,

I do believe that swagger is more mature than json-schemas, but regardless, I don't see that being the the primary driver personally. Fwiw, there exists no json-schema for the eth 1.0 json-rpc, despite attempts historically to address this. It's been a source of bugs over many years.

As for locking into a transport, that's partially true. However, it also solves many other problems:

Regarding the points raised

  1. Breaks tooling
    • There's quite a lot of tooling around http already, lots of tools that we don't even have to build, because they already exist across a variety of platforms
  2. No ability to batch request
    • HTTP does have batching, in the form of http pipelining. More paralellizability is coming with HTTP3 (see next point)
  3. Not true, HTTP3 is in the works, based on Quic.
  4. Maybe, can't say
  5. See reasons above