Kraft mode Kafka pods OOM killed periodically

eroji commented 2 months ago

Name and Version

bitnami/kafka:9.6.1

What architecture are you using?

amd64

What steps will reproduce the bug?

Deploy a 3 replica statefulset cluster via Helm chart

Are you using any custom parameters or values?

service.type=LoadBalancer
sasl.client.users[0]=admin
sasl.client.passwords=somepassword
tls.autoGenerated=true
listeners.client.protocol=SASL_SSL

What is the expected behavior?

The default heap size is appears to be 1GB but the memory limit is 768Mi. The cluster is not actually being used yet so it should not run out of memory on its own. If however it requires more memory by default in Kraft mode, then the default should be that in the Helm chart.

What do you see instead?

The pods are being killed due to failed liveness/readiness check.

The final entries in the log which seems to be consistent each time it is killed

[2024-06-12 14:08:44,226] INFO [QuorumController id=0] processBrokerHeartbeat: event failed with StaleBrokerEpochException in 160 microseconds. (org.apache.kafka.controller.QuorumController)
2024-06-12T14:08:46.232637184Z [2024-06-12 14:08:46,232] INFO [QuorumController id=0] The request from broker 1 to unfence has been granted because it has caught up with the offset of its register broker record 268863. (org.apache.kafka.controller.BrokerHeartbeatManager)
[2024-06-12 14:08:46,232] INFO [QuorumController id=0] Replayed BrokerRegistrationChangeRecord modifying the registration for broker 1: BrokerRegistrationChangeRecord(brokerId=1, brokerEpoch=268863, fenced=-1, inControlledShutdown=0, logDirs=[]) (org.apache.kafka.controller.ClusterControlManager)
[2024-06-12 14:08:58,237] INFO [SnapshotGenerator id=0] Creating new KRaft snapshot file snapshot 00000000000000268894-0000000135 because we have waited at least 60 minute(s). (org.apache.kafka.image.publisher.SnapshotGenerator)
[2024-06-12 14:08:58,252] INFO [SnapshotEmitter id=0] Successfully wrote snapshot 00000000000000268894-0000000135 (org.apache.kafka.image.publisher.SnapshotEmitter)
[2024-06-12 14:18:33,167] INFO [RaftManager id=0] Node 2 disconnected. (org.apache.kafka.clients.NetworkClient)
2024-06-12T15:08:58.450255645Z [2024-06-12 15:08:58,449] INFO [SnapshotGenerator id=0] Creating new KRaft snapshot file snapshot 00000000000000276092-0000000135 because we have waited at least 60 minute(s). (org.apache.kafka.image.publisher.SnapshotGenerator)
[2024-06-12 15:08:58,457] INFO [SnapshotEmitter id=0] Successfully wrote snapshot 00000000000000276092-0000000135 (org.apache.kafka.image.publisher.SnapshotEmitter)

The cause for Kubernetes killing the pod

Last state: Terminated with 137: OOMKilled, started: Wed, Jun 12 2024 6:25:48 am, finished: Wed, Jun 12 2024 9:30:17 am

Additional information

No response

jotamartos commented 2 months ago

Hi @eroji,

I just tried to reproduce the issue using the latest Bitnami Kafka Chart and everything worked as expected. I applied the same changes you mentioned in the values.yaml file

diff --git a/bitnami/kafka/values.yaml b/bitnami/kafka/values.yaml
index f2c92ee02f..64872c7d64 100644
--- a/bitnami/kafka/values.yaml
+++ b/bitnami/kafka/values.yaml
@@ -164,7 +164,7 @@ listeners:
   ## @param listeners.client.sslClientAuth Optional. If SASL_SSL is enabled, configure mTLS TLS authentication type. If SSL protocol is enabled, overrides tls.authType for this listener. Allowed values are 'none', 'requested' and 'required'
   client:
     containerPort: 9092
-    protocol: SASL_PLAINTEXT
+    protocol: SASL_SSL
     name: CLIENT
     sslClientAuth: ""
   ## @param listeners.controller.name Name for the Kafka controller listener
@@ -264,8 +264,8 @@ sasl:
   ##
   client:
     users:
-      - user1
-    passwords: ""
+      - admin
+    passwords: "somepassword"
   ## Credentials for Zookeeper communications.
   ## @param sasl.zookeeper.user Username for zookeeper communications when SASL is enabled.
   ## @param sasl.zookeeper.password Password for zookeeper communications when SASL is enabled.
@@ -320,7 +320,7 @@ tls:
   ## @param tls.autoGenerated Generate automatically self-signed TLS certificates for Kafka brokers. Currently only supported if `tls.type` is `PEM`
   ## Note: ignored when using 'jks' format or `tls.existingSecret` is not empty
   ##
-  autoGenerated: false
+  autoGenerated: true
   ## @param tls.customAltNames Optionally specify extra list of additional subject alternative names (SANs) for the automatically generated TLS certificates.
   ##
   customAltNames: []
@@ -1405,7 +1405,7 @@ broker:
 service:
   ## @param service.type Kubernetes Service type
   ##
-  type: ClusterIP
+  type: LoadBalancer
   ## @param service.ports.client Kafka svc port for client connections
   ## @param service.ports.controller Kafka svc port for controller connections. It is used if "kraft.enabled: true"
   ## @param service.ports.interbroker Kafka svc port for inter-broker connections

and deployed the solution. Pods were running for 10+ mins without problems. Please debug the issue in your cluster and ensure you are not running into a performance issue

NAME                 READY   STATUS    RESTARTS   AGE
kafka-controller-0   1/1     Running   0          11m
kafka-controller-1   1/1     Running   0          11m
kafka-controller-2   1/1     Running   0          11m

florian-besser commented 1 month ago

@jotamartos I can confirm this is also happening on ARM-based clusters. Our cluster is in AWS using their ARM-based machines.

We use the helm chart oci://registry-1.docker.io/bitnamicharts/kafka using the following values:

extraConfigYaml:
  "authorizer.class.name": "org.apache.kafka.metadata.authorizer.StandardAuthorizer"
  "super.users": "User:controller_user"
listeners:
  client:
    containerPort: 9092
    protocol: SASL_SSL
    name: CLIENT
  controller:
    name: CONTROLLER
    containerPort: 9093
    protocol: SASL_SSL
    sslClientAuth: "required"
sasl:
  enabledMechanisms: PLAIN #,SCRAM-SHA-256,SCRAM-SHA-512
tls:
  type: PEM
  existingSecret: <censored>
  keystorePassword: <censored>
  truststorePassword: <censored>

The cluster comes up and works well, but each controller over time gets OOMKilled after several minutes.

Note that during this time the cluster isn't under load at all, we don't yet have any apps sending messages to the cluster so far.

The OOM happens only after several minutes, as the cluster seemingly idles something soaks up memory.

jotamartos commented 1 month ago

Hi,

I tried to reproduce the issue again in a ARM-based cluster (the one Docker Desktop provides in a M1-based Mac OS X) and didn't get any error

$ k get pods 
NAME                 READY   STATUS    RESTARTS   AGE
kafka-controller-0   1/1     Running   0          49m
kafka-controller-1   1/1     Running   0          49m
kafka-controller-2   1/1     Running   0          49m

I applied the same values file I used in my previous message. In that case, the tests were executed in a x64-based cluster in GKE.

Please note that you can set a different resourcesPreset configuration for you deployment and see if that solves the issue in your cluster.

  ## @param controller.resourcesPreset Set container resources according to one common preset (allowed values: none, nano, micro, small, medium, large, xlarge, 2xlarge). This is ignored if controller.resources is set (controller.resources is recommended for production).
  ## More information: https://github.com/bitnami/charts/blob/main/bitnami/common/templates/_resources.tpl#L15
  ##
  resourcesPreset: "small"

florian-besser commented 1 month ago

We have also made some progress; we set heapOpts in the helm chart to:

"-Xmx512m -Xms512m -XX:MetaspaceSize=96m -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:G1HeapRegionSize=16M -XX:MinMetaspaceFreeRatio=50 -XX:MaxMetaspaceFreeRatio=80 -XX:+ExplicitGCInvokesConcurrent"

This also stopped the issue for us mostly, but I'm unsure how good these options are.

We are seeing some pods now being killed but no longer due to K8s OOMKilled as far as I can see. Could it be possible that Java with this reduced JVM space as per above decides to "give up" at some point, which would yield issues?

I will try and increase the opts and see if that fixes the issue in the coming days.

github-actions[bot] commented 1 month ago

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

florian-besser commented 1 month ago

Turns out the above heapOpts worked for us; we faced some pod terminations due to our k8s cluster scaling which isn't related to Kafka at all (or this helm chart).

As such I would suggest tweaking the defaults in the helm chart to help with OOM but other than that we seem to be in OK.

jotamartos commented 4 weeks ago

As such I would suggest tweaking the defaults in the helm chart to help with OOM but other than that we seem to be in OK.

Would you like to contribute and improve the Chart? You can follow our contributing guidelines and the team will be more than happy to review the changes.

florian-besser commented 4 weeks ago

I have recently contributed in the form of https://github.com/bitnami/charts/pull/27877 but not yet received any feedback; I could consider helping here once that first PR is through.

jotamartos commented 3 weeks ago

Thanks! I can see the PR you mentioned is now merged 😄

Please do not hesitate to suggest any change in the solution

github-actions[bot] commented 1 week ago

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions[bot] commented 3 days ago

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

bitnami / charts