bitnami / charts

Bitnami Helm Charts
https://bitnami.com
Other
8.93k stars 9.18k forks source link

[kafka] Unable to set correct advertised listeners for external access behind TCP Ingress #25030

Closed dergyitheron closed 5 months ago

dergyitheron commented 6 months ago

Name and Version

bitnami/kafka 28.0.3

What architecture are you using?

amd64

What steps will reproduce the bug?

I have a K3s cluster with 3 worker nodes, which have built-in LB (klipper) and Ingress controller (traefik). I have successfully deployed bitnami/kafka helm chart with some custom values listed below. I can connect from within the K3s cluster and produce and consume messages.

Now I'm trying to set up external access including IngressRouteTCP and have some issues with advertised listeners. The consumer connects to the bootstrap server but gets advertised listeners with Kafka ports, not the ingress entrypoint ports.

The traefik entrypoint is on port 443 so I need for the consumer to receive the advertised listener with the external URL set in IngressRouteTCP and port 443.

I've also tried ClusterIP type for external access but that allows only one domain for the service to be specified, so advertised listeners values are even more incorrect (same for every kafka node).

I've also noticed that external access service of type ClusterIP and LoadBalancer creates separate services for each pod, so since I have Ingress on top with the domain for each pod it doesn't really matter which one I use as it ends up in the correct pod every time.

What is the best way to approach this and set up correctly?

Are you using any custom parameters or values?

Values

global:
  storageClass: "local-path"
commonLabels:
  {
    app.kubernetes.io/name: kafka,
    app.kubernetes.io/instance: kafka,
    app.kubernetes.io/component: kafka,
    app.kubernetes.io/part-of: kafka,
  }
diagnosticMode:
  enabled: false
image:
  pullPolicy: IfNotPresent
listeners:
  client:
    containerPort: 9092
    protocol: SASL_PLAINTEXT
    name: CLIENT
  controller:
    name: CONTROLLER
    containerPort: 9093
    protocol: PLAINTEXT
    sslClientAuth: ""
  interbroker:
    containerPort: 9094
    protocol: PLAINTEXT
    name: INTERNAL
    sslClientAuth: ""
  external:
    containerPort: 9095
    protocol: SASL_PLAINTEXT
    name: EXTERNAL
    sslClientAuth: ""
sasl:
  enabledMechanisms: PLAIN,SCRAM-SHA-256,SCRAM-SHA-512
  client:
    users:
      - adminuser
    passwords: "adminpassword"
controller:
  replicaCount: 3
  resourcesPreset: "none"
  resources:
    limits:
      memory: 2Gi
  nodeSelector:
    node-role.kubernetes.io/storage-worker: "true"
  persistence:
    enabled: true
    storageClass: "local-path"
    accessModes:
      - ReadWriteOnce
    size: 8Gi
    mountPath: /bitnami/kafka
externalAccess:
  enabled: true
  controller:
    service:
      type: LoadBalancer
      ports:
        external: 9095
      loadBalancerNames:
        - "kafka-0.my-domain.org"
        - "kafka-1.my-domain.org"
        - "kafka-2.my-domain.org"
  broker:
    service:
      type: LoadBalancer
      ports:
        external: 9095
      loadBalancerNames: 
        - "kafka-0.my-domain.org"
        - "kafka-1.my-domain.org"
        - "kafka-2.my-domain.org"
kraft:
  enabled: true
  clusterId: 1
zookeeper:
  enabled: false

IngressRouteTCP

apiVersion: traefik.containo.us/v1alpha1
kind: IngressRouteTCP
metadata:
  name: kafka-tls-ingress
  namespace: kafka
spec:
  entryPoints:
    - websecure
  routes:
    - match: HostSNI(`kafka-0.my-domain.org`)
      services:
        - name: kafka-controller-0-external
          port: 9095
    - match: HostSNI(`kafka-1.my-domain.org`)
      services:
        - name: kafka-controller-1-external
          port: 9095
    - match: HostSNI(`kafka-2.my-domain.org`)
      services:
        - name: kafka-controller-2-external
          port: 9095
  tls:
    secretName: kafka-tls-secret

What is the expected behavior?

It is unclear to me how I can modify the advertised listeners to return the external listener with the correct domain and correct port for the kafka node.

I expect to see this (for first kafka node, notice the port for EXTERNAL):

advertised.listeners = CLIENT://kafka-controller-0.kafka-controller-headless.kafka.svc.cluster.local:9092,INTERNAL://kafka-controller-0.kafka-controller-headless.kafka.svc.cluster.local:9094,EXTERNAL://kafka-0.my-domain.org:443

I only expect to connect to kafka from inside of the cluster or outside via ingress route but I don't want to expose any additional ports outside.

What do you see instead?

advertised.listeners = CLIENT://kafka-controller-0.kafka-controller-headless.kafka.svc.cluster.local:9092,INTERNAL://kafka-controller-0.kafka-controller-headless.kafka.svc.cluster.local:9094,EXTERNAL://kafka-0.my-domain.org:9095

The external listener advertises on port 9095. I cannot set up LoadBalancer type for port 443 because it's already taken by the traefik ingress controller on each K3s node.

extraPorts value doesn't help as it tries to allocate the 443 port.

Additional information

I can also see this behavior with consumers written in the .NET package Confluent.Kafka.

BootstrapServers = "kafka-0.my-domain.org:443"

Connects to the bootstrap server and tries to establish connection on advertised listener found in properties of the kafka node.

Log:

`%3|1712562181.395|ERROR|rdkafka#consumer-1| [thrd:app]: rdkafka#consumer-1: GroupCoordinator: kafka-1.my-domain.org:9095: Connection setup timed out in state CONNECT (after 30034ms in state CONNECT)

I know I might have some redundant values for the helmchart, I'll take care of that once this is solved. TLS termination is handled by traefik ingress controller and I'm not concerned about plaintext communication within the K3s cluster (nodes are in an isolated network).

dergyitheron commented 6 months ago

Quick update, I have the following workaround with all pods accessible on dedicated Ingress host names.

The downside of this is that the built-in LoadBalancer is present and unused since I've only used the associated configuration to have correct hostname and port for advertised listener for each node.

I see only few possible solutions from most to least prefered:

  1. someone here figures it out and tells me something I've missed
  2. me or someone else suggest PR which solves this (such as being able to use {{ include "common.names.fullname" $ }} in domain or something similar)
  3. I'll clone the chart and keep my own modified version
  4. I'll keep the workaround solution

The workaround:

externalAccess:
  enabled: true
  controller:
    service:
      type: LoadBalancer
      allocateLoadBalancerNodePorts: false
      ports:
        external: 443
      loadBalancerNames:
        - "kafka-0.my-domain.org"
        - "kafka-1.my-domain.org"
        - "kafka-2.my-domain.org"
  broker:
    service:
      type: LoadBalancer
      allocateLoadBalancerNodePorts: false
      ports:
        external: 443
      loadBalancerNames: 
        - "kafka-0.my-domain.org"
        - "kafka-1.my-domain.org"
        - "kafka-2.my-domain.org"
kraft:
  enabled: true
  clusterId: 1
zookeeper:
  enabled: false
extraDeploy:
  - |
    apiVersion: traefik.containo.us/v1alpha1
    kind: IngressRouteTCP
    metadata:
      name: kafka-tls-ingress
      namespace: kafka
    spec:
      entryPoints:
        - websecure
      routes:
        - match: HostSNI(`kafka.my-domain.org`)
          services:
            - name: kafka
              port: 9095
        - match: HostSNI(`kafka-0.my-domain.org`)
          services:
            - name: kafka-controller-0-external-custom
              port: 9095
        - match: HostSNI(`kafka-1.my-domain.org`)
          services:
            - name: kafka-controller-1-external-custom
              port: 9095
        - match: HostSNI(`kafka-2.my-domain.org`)
          services:
            - name: kafka-controller-2-external-custom
              port: 9095
      tls:
        secretName: kafka-tls-secret
  - |
    apiVersion: v1
    kind: Service
    metadata:
      name: kafka-controller-0-external-custom
    spec:
      type: ClusterIP
      ports:
        - name: tcp-kafka-custom
          port: 9095
          protocol: TCP
          targetPort: external
      selector:
        app.kubernetes.io/component: controller-eligible
        app.kubernetes.io/instance: kafka
        app.kubernetes.io/name: kafka
        app.kubernetes.io/part-of: kafka
        statefulset.kubernetes.io/pod-name: kafka-controller-0
  - |
    apiVersion: v1
    kind: Service
    metadata:
      name: kafka-controller-1-external-custom
    spec:
      type: ClusterIP
      ports:
        - name: tcp-kafka-custom
          port: 9095
          protocol: TCP
          targetPort: external
      selector:
        app.kubernetes.io/component: controller-eligible
        app.kubernetes.io/instance: kafka
        app.kubernetes.io/name: kafka
        app.kubernetes.io/part-of: kafka
        statefulset.kubernetes.io/pod-name: kafka-controller-1
  - |
    apiVersion: v1
    kind: Service
    metadata:
      name: kafka-controller-2-external-custom
    spec:
      type: ClusterIP
      ports:
        - name: tcp-kafka-custom
          port: 9095
          protocol: TCP
          targetPort: external
      selector:
        app.kubernetes.io/component: controller-eligible
        app.kubernetes.io/instance: kafka
        app.kubernetes.io/name: kafka
        app.kubernetes.io/part-of: kafka
        statefulset.kubernetes.io/pod-name: kafka-controller-2

I can confirm this works by connecting .NET consumers to Kafka and consuming messages.

carrodher commented 6 months ago

The issue may not be directly related to the Bitnami container image or Helm chart, but rather to how the application is being utilized or configured in your specific environment.

Having said that, if you think that's not the case and are interested in contributing a solution, we welcome you to create a pull request. The Bitnami team is excited to review your submission and offer feedback. You can find the contributing guidelines here.

Your contribution will greatly benefit the community. Feel free to reach out if you have any questions or need assistance.

If you have any questions about the application itself, customizing its content, or questions about technology and infrastructure usage, we highly recommend that you refer to the forums and user guides provided by the project responsible for the application or technology.

With that said, we'll keep this ticket open until the stale bot automatically closes it, in case someone from the community contributes valuable insights.

dergyitheron commented 6 months ago

Thank you for your response.

Are you saying that my need to have the Ingress hostname with port that differs from the Service used for external access included for each kafka replica in advertised listener is basically "me" problem, or that there is absolutely no way to solve this with the current helm chart?

As I mentioned previously, I just want to know if I missed something within the configuration options that would make it work or contributing to the project is the only way.

I can understand that my case might not be the way to go but I'm mostly restricted by the environment right now.

github-actions[bot] commented 5 months ago

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions[bot] commented 5 months ago

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.