apache / bookkeeper

Apache BookKeeper - a scalable, fault tolerant and low latency storage service optimized for append-only workloads
https://bookkeeper.apache.org/
Apache License 2.0
1.91k stars 904 forks source link

Support for non-Java clients #2167

Open atombender opened 5 years ago

atombender commented 5 years ago

FEATURE REQUEST

Any multi-language support on the horizon? I see some smatterings of Python code, but it doesn't appear to be a full-featured client.

I have been researching BookKeeper as a potential solution for a system written in Go. However, the current API appears to be Java-only, even though the Bookie protocol (which is apparently undocumented?) is Protobuf.

I was puzzled by the fact that there is no documented wire protocol for the main server, and indeed, looking at the Java client, it looks like it is in fact a "fat" client that talks directly to both ZooKeeper and the Bookie backends.

So it appears that in order to access BookKeeper from a non-JVM language, you have to port the entire fat client.

A lot of projects in the Hadoop/Java world — Flink, Flume and Accumulo come to mind — take this Java-centric approach rather than an API-first approach. This was also also the situation Kafka was in when it launched, though the situation is much better these days. For someone outside Java land, it's a little disappointing to see relatively new projects do exactly the same thing all over again.

Are there plans to extricate all of this tightly coupled logic into a server that can offer a language-agnostic API using gRPC/Protobuf or similar?

merlimat commented 5 years ago

There are multiple good reasons for why BK was designed with a fat-client (as always, every decision implies tradesoff..).

Depending on what you're looking for in BK, you could be able to get access to its properties through Pulsar (https://pulsar.apache.org), which offers client libs in multiple languages.

atombender commented 5 years ago

Thanks. Pulsar is complete overkill for this use case. We just require distributed, replicated persistent log streams — not messaging, schemas, functions, authentication, SQL, data feeds, etc.

Using Pulsar adds yet another operational dependency, so you end up with Pulsar, BookKeeper (which is a complex thing itself), and ZooKeeper. We already run Etcd, so ZooKeeper is an unnecessary burden which I'd like to avoid having to manage. In my experience, Etcd is simpler to operate and lighterweight.

Today, the system uses PostgreSQL for log streams, so the cost/benefit analysis has to include the operational complexity.

eolivelli commented 5 years ago

@atombender we are receiving several requests about having the ability to use BK from other languages. We also have ideas and design documents for an implementation, we just lack some volunteer that has enough time to bring this effort to life.

liangyuanpeng commented 5 years ago

If this part has any roadmap, that’s great.

rodrigoreis commented 4 years ago

@atombender we are receiving several requests about having the ability to use BK from other languages. We also have ideas and design documents for an implementation, we just lack some volunteer that has enough time to bring this effort to life.

I'm trying to implement a simple client for dotnet, I'm reading to understand just as it was done for the Pulsar. The calls are made using gRPC but there is no low level documentation about those calls. Any help!?