apache / pulsar-helm-chart

Official Apache Pulsar Helm Chart
https://pulsar.apache.org/
Apache License 2.0
209 stars 220 forks source link

Create 2 services for brokers: a headless service with publishNotReadyAddresses and an ordinary clusterip service for lookups #437

Open lhotari opened 8 months ago

lhotari commented 8 months ago

Is your feature request related to a problem? Please describe.

Pulsar brokers use Zookeeper for publishing the membership to other brokers. The Pulsar load manager will detect a new broker and assign new namespace bundles to the new broker. One of the gaps with Pulsar and k8s is that the broker registers itself to Zookeeper and the broker pod cannot be accessed before the broker pod's readiness probe check has passed and k8s marks the pod ready and makes it available in DNS. There's also additional delay from DNS caches. This delay will cause a short service disruption for all topics belonging to the namespace bundles that have been assigned to the new broker.

For the brokers, there's also other reasons: the headless service has "publishNotReadyAddresses: true" and other service (which would be used for lookups) doesn't, so that lookups go only to healthy brokers.

Headless broker services with a lot of brokers could cause issues with Pulsar clients that are running on Alpine <3.18. k8s docs have something about this too: https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/#known-issues .

"If you are using Alpine version 3.17 or earlier as your base image, DNS may not work properly due to a design issue with Alpine. Until musl version 1.24 didn't include TCP fallback to the DNS stub resolver meaning any DNS call above 512 bytes would fail. Please upgrade your images to Alpine version 3.18 or above."

Since headless services return all hosts in the DNS response, it could exceed 512 bytes.

Describe the solution you'd like

The way to mitigate this is to have a separate ClusterIP service to be used for lookups and have a headless service with "publishNotReadyAddresses: true". This is a common way to address the problem. One example of this is the Bitnami Zookeeper Helm chart, https://github.com/bitnami/charts/tree/main/bitnami/zookeeper/templates there's svc.yaml and svc-headless.yaml

"StatefulSets currently require a Headless Service to be responsible for the network identity of the Pods." https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/#limitations Since brokers use a statefulset, the headless service cannot be removed.

The DNS names of the 2 services will be different. Since clients don't need to know the direct DNS names of the brokers, it's better to change the current headless service to be the ClusterIP service and add a new headless service for handling the stable names of the brokers. Admin scripts that connect directly to individual brokers might break, but that's could be made clear in the release notes when this feature is released.

lhotari commented 4 weeks ago

It's possible that making a change to use podManagementPolicy: OrderedReady for the broker statefulset could be a partial mitigation to this issue until this is resolved. #474 adds a solution where podManagementPolicy: OrderedReady is used when the function worker is enabled in the broker. it's not currently possible to configure to use this explicitly in the Helm chart without modifications.

lhotari commented 4 weeks ago

the podManagementPolicy workaround is implemented in #525