apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.17k stars 3.57k forks source link

Consider using gRPC as an externally exposed API #271

Closed zaa closed 1 year ago

zaa commented 7 years ago

grpc (http://grpc.io) has ready-made clients for Java, C++, Go, Python, etc. So Yahoo Pulsar clients would not need to reimplement efficient clients in all the languages (currently exposed websocket interface does not support all the methods provided by the protobuf based protocol, has lower performance and requires creation of a separate websocket connection per topic publisher/consumer).

agarman commented 7 years ago

Are you suggesting gRPC service that uses the C++ lib as an alternative to the Web Sockets service? Or are you suggesting a rewrite of Java & C++ libs to use gRPC instead of current protobuf based protocol?

merlimat commented 7 years ago

When we started the project gRPC was not available yet so we went with custom protocol.

I think it would be great to offer a gRPC based interface for better integration, though I would see that as an additional layer, such as the WebSocket proxy (which can run embedded in the broker or as a separate component).

One of the primary goals of the custom binary protocol we came up with, was to have the client establish a "session" (either producer/consumer), attached to a topic, perform authentication and then let it publish/receive messages as fast as possible, with flow control to guard the rail.

Eg: we don't want to perform auth at every message, or to specify the topic name each time (which in many cases can be as long as the data itself).

So, mixing the "session" with RPC seems a bit complicated. Also guaranteeing ordering would be challenging as there would be no relation for different publish requests on the same topic.

Having said that, I'd be really happy to have a gRPC based proxy service. Contributions welcome! ๐Ÿ˜„

It may be also interesting to offer the same interface (or at least a significant portion of it) as GCP pub-sub: https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1

lzaugg commented 6 years ago

Having gRPC as an alternative to the Web Socket service would be awesome. gRPC has built-in support for bidirectional streams - so basically it could be seen as a session (it's using HTTP/2 streams), no? Guaranteeing the message order shouldn't be a problem then. I think it's also important to know if pulsar is going to loosen the message ordering constraints (at least for such an "additional" layer) because it would make things easier (for use cases where ordering is not important - in the same way as GCP pub/sub doesn't guarantee any order).

sijie commented 6 years ago

Having gRPC as an alternative to the Web Socket service would be awesome. gRPC has built-in support for bidirectional streams - so basically it could be seen as a session (it's using HTTP/2 streams), no?

agreed. I believe ordering is not a problem with gRPC bidirectional streaming. It is actually very easier and super fun on using gRPC bidirectional streaming.

I think the most interesting piece here is to add a GCP pub/sub proxy with gRPC protocol.

I think it's also important to know if pulsar is going to loosen the message ordering constraints (at least for such an "additional" layer) because it would make things easier

in the context of "shared" subscription, the message ordering constraints are already relaxed. that said you can use exclusive/failover subscription for ordered message consumption, shared subscription for non-ordered message consumption.

RobIsHere commented 6 years ago

Whole Clusters run on grpc like k8s + istio and it's well supported by proxies like envoy.

You could think about clients talking to their user's topics directly, authenticated by ingress - e.g. envoy filters (see https://www.envoyproxy.io/docs/envoy/latest/configuration/http_filters/grpc_web_filter for ideas).

IMHO, I'm not convinced about GCP pub-sub. Making one thing like the other is almost often a large effort and a bad fit if you look into details. When google changes the api, do you follow? Wrapping your already proofen, tested and implemented custom protocol in grpc is probably a huge time saver. And your clients apis are well-designed like they are. Better early than google like ;)

snoodleboot commented 6 years ago

I would love to see this! I can understand about pub/sub. It really doesn't serve the same purpose as Pulsar or any distributed log. Different use cases.

cbornet commented 5 years ago

+1. Supporting gRPC would give access to a lot more clients, integration with reactive frameworks (like RxJava or Reactor), provide application-level flow control, etc... Another stream protocol to watch IMO is RSocket.io which has built-in integration of reactive-streams spec (non blocking streams with back-pressure). Since it was just released, it lacks client SDKs but that should evolve over time.

cbornet commented 5 years ago

One of the primary goals of the custom binary protocol we came up with, was to have the client establish a "session" (either producer/consumer), attached to a topic, perform authentication and then let it publish/receive messages as fast as possible, with flow control to guard the rail. Eg: we don't want to perform auth at every message, or to specify the topic name each time (which in many cases can be as long as the data itself). So, mixing the "session" with RPC seems a bit complicated. Also guaranteeing ordering would be challenging as there would be no relation for different publish requests on the same topic.

gRPC can establish a session for bidirectional streaming ! So IMO it could totally be used as the base protocol for Pulsar. The definition would look something like

service Pulsar {
    rpc exchange(stream BaseCommand) returns (stream BaseCommand);
}

Instead of passing auth info via CommandConnect, you would pass them as gRPC's Metadata fields (similar to HTTP headers). Note that this reuses the PulsarApi.proto, so a lot of code would be unchanged I think. You would probably still need drivers because some functionalities require cooperation between the client and the server. But these drivers would be easier to write. gRPC also has auth mecanisms built-in that could maybe be reused and has built-in flow control.

That said I have started the work on a gRPC proxy. Consumption and production are working. I need to clean it up then I'll do a PR.

cbornet commented 5 years ago

And if I'm not mistaken BookKeeper uses gRPC internally, so it would be coherent to make it the base protocol in Pulsar also.

sijie commented 5 years ago

That said I have started the work on a gRPC proxy. Consumption and production are working. I need to clean it up then I'll do a PR.

Look forward to your PR.

And if I'm not mistaken BookKeeper uses gRPC internally, so it would be coherent to make it the base protocol in Pulsar also.

gRPC is used only used for bookkeeper's table service. but the ledger service is still using custom protocol. but agreed with you, gRPC has very rich ecosystem, it is a good direction to good in general.

merlimat commented 5 years ago

@cbornet The reason we haven't used gRPC is that it wasn't available when we started, so we went with custom protocol over protocol buffer. After that, migrating the internal protocol was a big step.

cbornet commented 5 years ago

Yes. That's a very good reason indeed. Maybe in the future ๐Ÿ˜„ . I can understand there are bigger priorities.

mickdelaney commented 5 years ago

Any progress likely on this ? Weโ€™re Python, dotnet & so have no viable option to use Pulsar.

sijie commented 5 years ago

@mickdelaney Pulsar has a python client and there is an ongoing development of dotnet client. Does it meet your requirement? Or gRPC is your preferred option?

mickdelaney commented 5 years ago

Hi, Sorry for the late reply.

So we use Kafka at the moment, confluent provide dotnet & python clients, based on librdkafka which in theory gives a baseline for all clients that extend it.

The reality is that its very expensive to maintain all these language drivers, and so you get differences, you get things that are coming down the line, for example the schema/avro support in the various languages for kafka varies significantly, Java being very different than say C#.

So for teams using these drivers, you have to rely on different semantics, you have to create different approaches to dealing with things like schemas, and it increases costs.

Also you have to think about the teams providing the drivers, and the costs they have in maintaining them. Its not easy.

So if there's a possibility that GRPC will fit the semantis of pulsars protocol, it seems to me that its a win for everyone, the pulsar team in particular can focus they're attention on making the GRPC layer first class.

Thanks...

TC-oKozlov commented 4 years ago

We have real-time messaging system implemented in Erlang, and looking at pulsar as a pub/sub /queue message broker. Unfortunately that means implementing our own client lib with tons of features on top of binary / protobuf protocol. Having gRPC support would have greatly helped

sijie commented 4 years ago

@mickdelaney @TC-oKozlov thank you for your input.

just to understand a bit more about the requirements, are you expecting a gRPC based proxy or pulsar broker protocol exposed in gPRC? This would lead into two different approaches.

A gRPC based proxy means providing a much simpler protocol than the current broker protocol. So it is easy to have different language gRPC clients. But it will has its own limitations and drawbacks, such as another network hop, and some of the features might be hard to support and etc.

Exposing pulsar broker protocol in gPRC can solve the problem in handling wire-level request & response encoding and decoding. However the challenge of implementing a Pulsar client is not about handling wire-level encoding and decoding. It is more about the logic within a Pulsar client, such as flow-control, topic lookup, error handling and etc. So we will still be facing the same challenges that current Pulsar client is facing. It is probably even worse than implementing language client wrapper using Pulsar c/c++ client, because implementing a language client wrapper is much simpler and less error prone than re-implementing flow-control, topic lookup and error handling in different languages.

I would like to collect more requirements of gRPC to understand what is the right approach for solving the problem here.

cbornet commented 4 years ago

I think moving to gRPC for the Pulsar clients would have some benefits. For instance it already handles flow control and bi-directional streaming. For those who want to write native clients, that's a layer less to develop. Another interesting alternative could be RSocket which has some very nice features such as session resumption and message-level backpressure. In JAVA, it would be possible to have a fully reactive-streams API using these protocols.

mickdelaney commented 4 years ago

@sijie thanks for the detailed feedback. i was thinking of the former, my thinking being that it would atleast remove some of the concerns in maintaining the various language level clients.

cbornet commented 3 years ago

Since v2.7.0 has been released, you can now use the gRPC protocol handler which implements PIP59. So far all features of 2.7.0 are implemented except transactions (coming soon) and credentials refreshing (probably harder). You can download a pre-version of the nar here. I'd be happy to get your feedback on this. I'll publish a blog post in the coming weeks.

cbornet commented 3 years ago

New pre-release with full transaction support : https://github.com/cbornet/pulsar-grpc/releases/tag/v1.0.0-20201206-rc

sl1316 commented 2 years ago

@cbornet can you provide some guidance regarding how to use the grpc protocol? I only saw binary protocol in http://pulsar.apache.org/docs/en/develop-binary-protocol/ .

tisonkun commented 1 year ago

Closed as answered by https://github.com/apache/pulsar/issues/271#issuecomment-738815393. New questions or issues can be created separately.