apache / accumulo

Apache Accumulo
https://accumulo.apache.org
Apache License 2.0
1.07k stars 446 forks source link

Consider switching to gRPC #5000

Open cshannon opened 2 weeks ago

cshannon commented 2 weeks ago

Background and Motivation

The Accumulo RPC layer uses Apache Thrift as both the transport and protocol but I think it would be worthwhile to consider switching to gRPC in the future for the RPC layer. I have done a lot of investigation and prototyping into alternatives and below summarizes things.

There are a few reasons to switch (which are highlighted below in the gRPC advantages section) but the primary motivation that started the investigation into gRPC is to be able to support async RPC calls on the server side. Async RPC calls would enable the server to handle many more connections and requests at a time and give us the ability to do things like long polling without blocking the IO threads. The initial use case that started this is described in #4664, but there are other use cases as well.

Thrift is synchronous and their Async api, which in theory could be used to accomplish this, is unfortunately quite limited (as described) below. This makes it difficult to handle a lot of connections concurrently with Thrift, especially if those requests are long lived.

Note: There is still one more Jetty prototype to work on and I will update the issue with those results when done.

Prototypes

I have investigated and prototyped both gRPC and also Thrift using Async processors as an alternative to compare both to the current sync Thrift api as well as to each other. Below are the the advantages/disadvantages I have found for both. I also am planning to test out a 3rd alternative, using an async REST API with Jetty because Jetty should support the different authentication mechanisms not supported by gRPC/async thrift. I will report back and link that prototype and findings when that is done as well.

Grpc Results

PR: https://github.com/apache/accumulo/pull/4715

Advantages:

  1. gRPC is Netty and Http/2 based so high performance and a non-blocking architecture. Netty is well tested and the most well known NIO framework.
  2. gRPC is async by default out of the box but is still easy to write RPC services that are sync if desired.
  3. gRPC supports streaming which would be quite useful for the client for scans.
  4. The gRPC serialization format is flexible. By default it supports protobuf but the format is pluggable so we could potentially use Thrift as the binary format and keep the existing Thrift objects.
  5. There is good SSL support out of the box.
  6. gRPC supports OAUTH2 and has an API for plugging in authentication.
  7. gRPC supports async on the client side as well if a client wants to make a non-blocking RPC call.
  8. The documentation is pretty good and gRPC has a wide adoption.

Disadvantages:

  1. The code changes are larger to replace an entire framework. However, we'd likely just start with one service at a time and we do not have to switch everything all at once.
  2. SASL is not supported (potential blocker)

Async Thrift

PR: https://github.com/apache/accumulo/pull/4931

Note:

Advantages:

  1. We already use Thrift so the changes are much smaller than switching out an entire RPC layer.
  2. We can continue to use all the existing RPC services and objects and only implement Async APIs for the services we want.

Disadvantages:

  1. Thrift does not support multiplexing async processors which means we'd have to add support or end up opening up a different server for every service. There is an open issue for this and an old PR that was closed that could be reopened to support this.
  2. SASL is not supported. While Thrift did add a non-blocking service implementation for SASL, it does not support the Async API and only sync, so we would have to implement this ourselves and contribute that back to support it. (potential blocker)
  3. SSL is NOT supported at all for non blocking servers. Non blocking server implementations are a requirement for the Async API and there is no support for that. Non-blocking SSL in Java requires using the SSLEngine and is extremely difficult to implement correctly. This is best left for other frameworks like Netty and it seems unlikely Thrit will add support for this. This is a big blocker
  4. Thrift does support using sync and async at the same time (on different ports) so we could potentially get around some issues below by only supporting async in some modes (not SSL for example), but this has major disadvantages including having to maintain two implementations and testing everything twice. Plus the performance impact could be large in the modes that require sync so it doesn't make sense to do that.
  5. The documentation for Async thrift is basically non existent. There is almost zero information and doesn't seem to be well supported or tested. I had to essentially reverse engineer the source code and trace things with break points to figure out how the async processors worked.

Async Jetty (wip prototype)

PR: Todo

This section will be updated after the prototype is done. There's some initial information on this approach in this comment.

Potential advantages:

  1. Jetty should support all the different authentication mechanisms including SSL and SASL.
  2. Jetty is already used in the monitor and REST is a well known/established pattern.
  3. Jetty supports Async servlet API so we can accomplish long polling.
  4. REST apis usually support Json and Thrift already has support for Json serialization but we could explore just using binary as well.

Porential Disadvantages:

  1. Using REST and Json is probably not ideal for all RPC calls.
  2. This would be a one-off server that would only be used for the CompactionCoodinator service.
dlmarion commented 1 week ago
  1. Using REST and Json is probably not ideal for all RPC calls.

Have you looked at CBOR?