apache / pulsar

Apache Pulsar - distributed pub-sub messaging system
https://pulsar.apache.org/
Apache License 2.0
14.23k stars 3.58k forks source link

Support for native coordination on Kubernetes #23539

Open malthe opened 13 hours ago

malthe commented 13 hours ago

Search before asking

Motivation

Pulsar now has a pluggable interface for coordination and metadata services, see #572 which was resolved through PIP-45.

In Apache NiFi, they've done something similar but thus far targeting the services offered already by Kubernetes, namely the Lease API and ConfigMaps:

https://exceptionfactory.com/posts/2024/08/10/bringing-kubernetes-clustering-to-apache-nifi/

Being internally based currently on etcd, this should perform similarly.

The motivation presented at the Pulsar Summit in 2022 applies even more so here:

Small clusters → remove overhead

Solution

Include a coordination and metadata backend that uses native Kubernetes services.

Alternatives

In the past, people have written proxies that surface for example the ZooKeeper API on top of etcd, see zetcd. It could be argued that an entirely separated service should be written that standardizes the use of Kubernetes services for leader election and metadata needs.

Anything else?

No response

Are you willing to submit a PR?

lhotari commented 11 hours ago

There's etcd support already in Pulsar and BookKeeper. However there's currently an open issue #23513 which I'll come back to in the next few weeks unless someone addresses before me. There might not be a lot of end user documentation for using other than Zookeeper at the moment, although there's also other alternatives (etcd, Oxia) available in Pulsar which also get used for BookKeeper metadata. /cc @Apurva007

lhotari commented 11 hours ago

However, I don't think that there's a recommendation to use Kubernetes internal etcd for Pulsar & BookKeeper metadata although it could be technically possible.