bitnami / charts

Bitnami Helm Charts
https://bitnami.com
Other
8.65k stars 9.01k forks source link

[bitnami/kafka] Kafka Using an external ZooKeeper cluster with the `bitnami/zookeeper` chart fails #6017

Closed stackedsax closed 3 years ago

stackedsax commented 3 years ago

Which chart: bitnami/kafka 2.7.0

Describe the bug Setting an external Zookeeper results in a non-functioning Kafka helm release.

To Reproduce Steps to reproduce the behavior:

  1. Launch a new Zookeeper helm release: helm install ext-zk bitnami/zookeeper
  2. Launch a new kafka helm release pointing to the new external Zookeeper cluster: helm upgrade --install my-release --set replicaCount=1 --set externalZookeeper.servers="ext-zk-zookeeper" --set zookeeper.enabled=false bitnami/kafka
  3. Observe the my-release-kafka-0 in a perpetual CrashLoopBackOff state.

Expected behavior Kafka should spin up and point to the external Zookeeper cluster happily.

Version of Helm and Kubernetes:

version.BuildInfo{Version:"v3.5.2", GitCommit:"167aac70832d3a384f65f9745335e9fb40169dc2", GitTreeState:"dirty", GoVersion:"go1.15.7"}
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.2", GitCommit:"faecb196815e248d3ecfb03c680a4507229c2a56", GitTreeState:"clean", BuildDate:"2021-01-14T05:14:17Z", GoVersion:"go1.15.6", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"18+", GitVersion:"v1.18.15-gke.1501", GitCommit:"de67bb0d58413ba2ba9b64810ab438a9734a2ab9", GitTreeState:"clean", BuildDate:"2021-02-26T23:41:52Z", GoVersion:"go1.13.15b4", Compiler:"gc", Platform:"linux/amd64"}

Additional context The problem seems to be related to a change in kafka 2.4.0:

To this end, if we disable Kafka persistence, the kafka cluster spins up just fine:

helm upgrade --install my-release --set persistence.enabled=false --set replicaCount=1 --set externalZookeeper.servers="ext-zk-zookeeper" --set zookeeper.enabled=false bitnami/kafka

As an additional datapoint, if we launch a version of the chart that uses a Kafka 2.3.x version, we can leave persistence on:

helm upgrade --install  my-release --version 7.0.4  --set replicaCount=3 --set externalZookeeper.servers="ext-zk-zookeeper.default.svc.cluster.local" --set zookeeper.enabled=false bitnami/kafka

The fixes suggested generally seem to be to delete either all the files in logs directories of the meta.properties file itself, but it seems like there must be a more elegant solution. I'll try to come back to this and figure out a more appropriate fix, but I just wanted to report the problem before I got distracted somewhere else.

carrodher commented 3 years ago

Thanks for letting us know. Maybe it is also useful the information stored in the logs from the Kafka pods, just in case it helps.

stackedsax commented 3 years ago

@carrodher no problem, thanks for taking a look.

I'm not sure I understand what you mean by:

... the information stored in the logs from the Kafka pods...

Could you elaborate?

carrodher commented 3 years ago

I mean you can obtain the logs from the containers running inside the pods, just in case you can find more info. Although if the pod is in a CrashLoopBackOff state the container is not going to be available to see the logs, in that case, you can describe the pod to see more info.

Find below some examples:

$ kubectl get pods
NAME                READY   STATUS    RESTARTS   AGE
kafka-0             1/1     Running   2          4m30s
kafka-zookeeper-0   1/1     Running   0          4m30s

$ kubectl logs kafka-0
09:06:05.65
 09:06:05.66 Welcome to the Bitnami kafka container
 09:06:05.66 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-kafka
 09:06:05.66 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-kafka/issues
 09:06:05.66
 09:06:05.66 INFO  ==> ** Starting Kafka setup **
 09:06:05.73 WARN  ==> You set the environment variable ALLOW_PLAINTEXT_LISTENER=yes. For safety reasons, do not use this flag in a production environment.
 09:06:05.74 INFO  ==> Initializing Kafka...
 09:06:05.75 INFO  ==> No injected configuration files found, creating default config files
 09:06:05.98 INFO  ==> Configuring Kafka for inter-broker communications with PLAINTEXT authentication.
 09:06:05.98 WARN  ==> Inter-broker communications are configured as PLAINTEXT. This is not safe for production environments.
 09:06:05.99 INFO  ==> Configuring Kafka for client communications with PLAINTEXT authentication.
 09:06:05.99 WARN  ==> Client communications are configured using PLAINTEXT listeners. For safety reasons, do not use this in a production environment.
 09:06:06.00 INFO  ==> ** Kafka setup finished! **

 09:06:06.02 INFO  ==> ** Starting Kafka **
[2021-04-07 09:06:07,460] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
[2021-04-07 09:06:08,172] INFO Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation (org.apache.zookeeper.common.X509Util)
[2021-04-07 09:06:08,269] INFO Registered signal handlers for TERM, INT, HUP (org.apache.kafka.common.utils.LoggingSignalHandler)
[2021-04-07 09:06:08,274] INFO starting (kafka.server.KafkaServer)
[2021-04-07 09:06:08,275] INFO Connecting to zookeeper on kafka-zookeeper (kafka.server.KafkaServer)
[2021-04-07 09:06:08,298] INFO [ZooKeeperClient Kafka server] Initializing a new session to kafka-zookeeper. (kafka.zookeeper.ZooKeeperClient)
[2021-04-07 09:06:08,304] INFO Client environment:zookeeper.version=3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on 05/04/2020 15:53 GMT (org.apache.zookeeper.ZooKeeper)
[2021-04-07 09:06:08,304] INFO Client environment:host.name=kafka-0.kafka-headless.carlosrh.svc.cluster.local (org.apache.zookeeper.ZooKeeper)
[2021-04-07 09:06:08,305] INFO Client environment:java.version=11.0.10 (org.apache.zookeeper.ZooKeeper)
[2021-04-07 09:06:08,305] INFO Client environment:java.vendor=BellSoft (org.apache.zookeeper.ZooKeeper)
[2021-04-07 09:06:08,305] INFO Client environment:java.home=/opt/bitnami/java (org.apache.zookeeper.ZooKeeper)
...

$ kubectl describe pod kafka-0
Name:         kafka-0
Namespace:    default
Priority:     0
Node:         gke-dev-default-pool-ab651c88-wp6f/10.130.47.221
Start Time:   Wed, 07 Apr 2021 09:05:08 +0000
Labels:       app.kubernetes.io/component=kafka
              app.kubernetes.io/instance=kafka
              app.kubernetes.io/managed-by=Helm
              app.kubernetes.io/name=kafka
              controller-revision-hash=kafka-7bff477496
              helm.sh/chart=kafka-12.16.0
              statefulset.kubernetes.io/pod-name=kafka-0
Annotations:  kubernetes.io/psp: 60-mayroot
Status:       Running
...
stackedsax commented 3 years ago

Ah, sure, I thought you meant the kafka logs themselves. But if you want the kafka container logs, this is what they show:

$ k logs siembol-kafka-0
 15:46:27.61 
 15:46:27.61 Welcome to the Bitnami kafka container
 15:46:27.61 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-kafka
 15:46:27.61 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-kafka/issues
 15:46:27.62 
 15:46:27.62 INFO  ==> ** Starting Kafka setup **
 15:46:27.69 WARN  ==> You set the environment variable ALLOW_PLAINTEXT_LISTENER=yes. For safety reasons, do not use this flag in a production environment.
 15:46:27.70 INFO  ==> Initializing Kafka...
 15:46:27.71 INFO  ==> No injected configuration files found, creating default config files
 15:46:28.06 INFO  ==> Configuring Kafka for inter-broker communications with PLAINTEXT authentication.
 15:46:28.06 WARN  ==> Inter-broker communications are configured as PLAINTEXT. This is not safe for production environments.
 15:46:28.07 INFO  ==> Configuring Kafka for client communications with PLAINTEXT authentication.
 15:46:28.07 WARN  ==> Client communications are configured using PLAINTEXT listeners. For safety reasons, do not use this in a production environment.
 15:46:28.08 INFO  ==> ** Kafka setup finished! **

 15:46:28.10 INFO  ==> ** Starting Kafka **
[2021-04-07 15:46:29,683] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
[2021-04-07 15:46:30,408] INFO Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation (org.apache.zookeeper.common.X509Util)
[2021-04-07 15:46:30,522] INFO Registered signal handlers for TERM, INT, HUP (org.apache.kafka.common.utils.LoggingSignalHandler)
[2021-04-07 15:46:30,529] INFO starting (kafka.server.KafkaServer)
[2021-04-07 15:46:30,530] INFO Connecting to zookeeper on ext-zk-zookeeper (kafka.server.KafkaServer)
[2021-04-07 15:46:30,553] INFO [ZooKeeperClient Kafka server] Initializing a new session to ext-zk-zookeeper. (kafka.zookeeper.ZooKeeperClient)
[2021-04-07 15:46:30,559] INFO Client environment:zookeeper.version=3.5.8-f439ca583e70862c3068a1f2a7d4d068eec33315, built on 05/04/2020 15:53 GMT (org.apache.zookeeper.ZooKeeper)
[2021-04-07 15:46:30,559] INFO Client environment:host.name=siembol-kafka-0.siembol-kafka-headless.default.svc.cluster.local (org.apache.zookeeper.ZooKeeper)
[2021-04-07 15:46:30,560] INFO Client environment:java.version=11.0.10 (org.apache.zookeeper.ZooKeeper)
[2021-04-07 15:46:30,560] INFO Client environment:java.vendor=BellSoft (org.apache.zookeeper.ZooKeeper)
[2021-04-07 15:46:30,560] INFO Client environment:java.home=/opt/bitnami/java (org.apache.zookeeper.ZooKeeper)
[2021-04-07 15:46:30,560] INFO Client environment:java.class.path=/opt/bitnami/kafka/bin/../libs/activation-1.1.1.jar:/opt/bitnami/kafka/bin/../libs/aopalliance-repackaged-2.6.1.jar:/opt/bitnami/kafka/bin/../libs/argparse4j-0.7.0.jar:/opt/bitnami/kafka/bin/../libs/audience-annotations-0.5.0.jar:/opt/bitnami/kafka/bin/../libs/commons-cli-1.4.jar:/opt/bitnami/kafka/bin/../libs/commons-lang3-3.8.1.jar:/opt/bitnami/kafka/bin/../libs/connect-api-2.7.0.jar:/opt/bitnami/kafka/bin/../libs/connect-basic-auth-extension-2.7.0.jar:/opt/bitnami/kafka/bin/../libs/connect-file-2.7.0.jar:/opt/bitnami/kafka/bin/../libs/connect-json-2.7.0.jar:/opt/bitnami/kafka/bin/../libs/connect-mirror-2.7.0.jar:/opt/bitnami/kafka/bin/../libs/connect-mirror-client-2.7.0.jar:/opt/bitnami/kafka/bin/../libs/connect-runtime-2.7.0.jar:/opt/bitnami/kafka/bin/../libs/connect-transforms-2.7.0.jar:/opt/bitnami/kafka/bin/../libs/hk2-api-2.6.1.jar:/opt/bitnami/kafka/bin/../libs/hk2-locator-2.6.1.jar:/opt/bitnami/kafka/bin/../libs/hk2-utils-2.6.1.jar:/opt/bitnami/kafka/bin/../libs/jackson-annotations-2.10.5.jar:/opt/bitnami/kafka/bin/../libs/jackson-core-2.10.5.jar:/opt/bitnami/kafka/bin/../libs/jackson-databind-2.10.5.1.jar:/opt/bitnami/kafka/bin/../libs/jackson-dataformat-csv-2.10.5.jar:/opt/bitnami/kafka/bin/../libs/jackson-datatype-jdk8-2.10.5.jar:/opt/bitnami/kafka/bin/../libs/jackson-jaxrs-base-2.10.5.jar:/opt/bitnami/kafka/bin/../libs/jackson-jaxrs-json-provider-2.10.5.jar:/opt/bitnami/kafka/bin/../libs/jackson-module-jaxb-annotations-2.10.5.jar:/opt/bitnami/kafka/bin/../libs/jackson-module-paranamer-2.10.5.jar:/opt/bitnami/kafka/bin/../libs/jackson-module-scala_2.12-2.10.5.jar:/opt/bitnami/kafka/bin/../libs/jakarta.activation-api-1.2.1.jar:/opt/bitnami/kafka/bin/../libs/jakarta.annotation-api-1.3.5.jar:/opt/bitnami/kafka/bin/../libs/jakarta.inject-2.6.1.jar:/opt/bitnami/kafka/bin/../libs/jakarta.validation-api-2.0.2.jar:/opt/bitnami/kafka/bin/../libs/jakarta.ws.rs-api-2.1.6.jar:/opt/bitnami/kafka/bin/../libs/jakarta.xml.bind-api-2.3.2.jar:/opt/bitnami/kafka/bin/../libs/javassist-3.25.0-GA.jar:/opt/bitnami/kafka/bin/../libs/javassist-3.26.0-GA.jar:/opt/bitnami/kafka/bin/../libs/javax.servlet-api-3.1.0.jar:/opt/bitnami/kafka/bin/../libs/javax.ws.rs-api-2.1.1.jar:/opt/bitnami/kafka/bin/../libs/jaxb-api-2.3.0.jar:/opt/bitnami/kafka/bin/../libs/jersey-client-2.31.jar:/opt/bitnami/kafka/bin/../libs/jersey-common-2.31.jar:/opt/bitnami/kafka/bin/../libs/jersey-container-servlet-2.31.jar:/opt/bitnami/kafka/bin/../libs/jersey-container-servlet-core-2.31.jar:/opt/bitnami/kafka/bin/../libs/jersey-hk2-2.31.jar:/opt/bitnami/kafka/bin/../libs/jersey-media-jaxb-2.31.jar:/opt/bitnami/kafka/bin/../libs/jersey-server-2.31.jar:/opt/bitnami/kafka/bin/../libs/jetty-client-9.4.33.v20201020.jar:/opt/bitnami/kafka/bin/../libs/jetty-continuation-9.4.33.v20201020.jar:/opt/bitnami/kafka/bin/../libs/jetty-http-9.4.33.v20201020.jar:/opt/bitnami/kafka/bin/../libs/jetty-io-9.4.33.v20201020.jar:/opt/bitnami/kafka/bin/../libs/jetty-security-9.4.33.v20201020.jar:/opt/bitnami/kafka/bin/../libs/jetty-server-9.4.33.v20201020.jar:/opt/bitnami/kafka/bin/../libs/jetty-servlet-9.4.33.v20201020.jar:/opt/bitnami/kafka/bin/../libs/jetty-servlets-9.4.33.v20201020.jar:/opt/bitnami/kafka/bin/../libs/jetty-util-9.4.33.v20201020.jar:/opt/bitnami/kafka/bin/../libs/jopt-simple-5.0.4.jar:/opt/bitnami/kafka/bin/../libs/kafka-clients-2.7.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-log4j-appender-2.7.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-raft-2.7.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-streams-2.7.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-streams-examples-2.7.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-streams-scala_2.12-2.7.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-streams-test-utils-2.7.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-tools-2.7.0.jar:/opt/bitnami/kafka/bin/../libs/kafka_2.12-2.7.0-sources.jar:/opt/bitnami/kafka/bin/../libs/kafka_2.12-2.7.0.jar:/opt/bitnami/kafka/bin/../libs/log4j-1.2.17.jar:/opt/bitnami/kafka/bin/../libs/lz4-java-1.7.1.jar:/opt/bitnami/kafka/bin/../libs/maven-artifact-3.6.3.jar:/opt/bitnami/kafka/bin/../libs/metrics-core-2.2.0.jar:/opt/bitnami/kafka/bin/../libs/netty-buffer-4.1.51.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-codec-4.1.51.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-common-4.1.51.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-handler-4.1.51.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-resolver-4.1.51.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-transport-4.1.51.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-transport-native-epoll-4.1.51.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-transport-native-unix-common-4.1.51.Final.jar:/opt/bitnami/kafka/bin/../libs/osgi-resource-locator-1.0.3.jar:/opt/bitnami/kafka/bin/../libs/paranamer-2.8.jar:/opt/bitnami/kafka/bin/../libs/plexus-utils-3.2.1.jar:/opt/bitnami/kafka/bin/../libs/reflections-0.9.12.jar:/opt/bitnami/kafka/bin/../libs/rocksdbjni-5.18.4.jar:/opt/bitnami/kafka/bin/../libs/scala-collection-compat_2.12-2.2.0.jar:/opt/bitnami/kafka/bin/../libs/scala-java8-compat_2.12-0.9.1.jar:/opt/bitnami/kafka/bin/../libs/scala-library-2.12.12.jar:/opt/bitnami/kafka/bin/../libs/scala-logging_2.12-3.9.2.jar:/opt/bitnami/kafka/bin/../libs/scala-reflect-2.12.12.jar:/opt/bitnami/kafka/bin/../libs/slf4j-api-1.7.30.jar:/opt/bitnami/kafka/bin/../libs/slf4j-log4j12-1.7.30.jar:/opt/bitnami/kafka/bin/../libs/snappy-java-1.1.7.7.jar:/opt/bitnami/kafka/bin/../libs/zookeeper-3.5.8.jar:/opt/bitnami/kafka/bin/../libs/zookeeper-jute-3.5.8.jar:/opt/bitnami/kafka/bin/../libs/zstd-jni-1.4.5-6.jar (org.apache.zookeeper.ZooKeeper)
[2021-04-07 15:46:30,560] INFO Client environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper)
[2021-04-07 15:46:30,560] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper)
[2021-04-07 15:46:30,560] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper)
[2021-04-07 15:46:30,560] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper)
[2021-04-07 15:46:30,560] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper)
[2021-04-07 15:46:30,561] INFO Client environment:os.version=5.4.89+ (org.apache.zookeeper.ZooKeeper)
[2021-04-07 15:46:30,561] INFO Client environment:user.name=? (org.apache.zookeeper.ZooKeeper)
[2021-04-07 15:46:30,561] INFO Client environment:user.home=? (org.apache.zookeeper.ZooKeeper)
[2021-04-07 15:46:30,561] INFO Client environment:user.dir=/ (org.apache.zookeeper.ZooKeeper)
[2021-04-07 15:46:30,561] INFO Client environment:os.memory.free=1013MB (org.apache.zookeeper.ZooKeeper)
[2021-04-07 15:46:30,561] INFO Client environment:os.memory.max=1024MB (org.apache.zookeeper.ZooKeeper)
[2021-04-07 15:46:30,561] INFO Client environment:os.memory.total=1024MB (org.apache.zookeeper.ZooKeeper)
[2021-04-07 15:46:30,564] INFO Initiating client connection, connectString=ext-zk-zookeeper sessionTimeout=18000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@495b0487 (org.apache.zookeeper.ZooKeeper)
[2021-04-07 15:46:30,570] INFO jute.maxbuffer value is 4194304 Bytes (org.apache.zookeeper.ClientCnxnSocket)
[2021-04-07 15:46:30,579] INFO zookeeper.request.timeout value is 0. feature enabled= (org.apache.zookeeper.ClientCnxn)
[2021-04-07 15:46:30,584] INFO [ZooKeeperClient Kafka server] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2021-04-07 15:46:30,612] INFO Opening socket connection to server ext-zk-zookeeper/10.39.248.152:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2021-04-07 15:46:30,623] INFO Socket connection established, initiating session, client: /10.36.4.60:54928, server: ext-zk-zookeeper/10.39.248.152:2181 (org.apache.zookeeper.ClientCnxn)
[2021-04-07 15:46:30,660] INFO Session establishment complete on server ext-zk-zookeeper/10.39.248.152:2181, sessionid = 0x10000017c010377, negotiated timeout = 18000 (org.apache.zookeeper.ClientCnxn)
[2021-04-07 15:46:30,663] INFO [ZooKeeperClient Kafka server] Connected. (kafka.zookeeper.ZooKeeperClient)
[2021-04-07 15:46:30,767] INFO [feature-zk-node-event-process-thread]: Starting (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)
[2021-04-07 15:46:31,025] INFO Updated cache from existing <empty> to latest FinalizedFeaturesAndEpoch(features=Features{}, epoch=0). (kafka.server.FinalizedFeatureCache)
[2021-04-07 15:46:31,030] INFO Cluster ID = mtrImbVDTm-jUIEpt8-jZQ (kafka.server.KafkaServer)
[2021-04-07 15:46:31,045] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.common.InconsistentClusterIdException: The Cluster ID mtrImbVDTm-jUIEpt8-jZQ doesn't match stored clusterId Some(OivcewOYSn2m-mUMUMMJUw) in meta.properties. The broker is trying to join the wrong cluster. Configured zookeeper.connect may be wrong.
        at kafka.server.KafkaServer.startup(KafkaServer.scala:252)
        at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:44)
        at kafka.Kafka$.main(Kafka.scala:82)
        at kafka.Kafka.main(Kafka.scala)
[2021-04-07 15:46:31,049] INFO shutting down (kafka.server.KafkaServer)
[2021-04-07 15:46:31,053] INFO [feature-zk-node-event-process-thread]: Shutting down (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)
[2021-04-07 15:46:31,054] INFO [feature-zk-node-event-process-thread]: Stopped (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)
[2021-04-07 15:46:31,055] INFO [feature-zk-node-event-process-thread]: Shutdown completed (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)
[2021-04-07 15:46:31,057] INFO [ZooKeeperClient Kafka server] Closing. (kafka.zookeeper.ZooKeeperClient)
[2021-04-07 15:46:31,164] INFO Session: 0x10000017c010377 closed (org.apache.zookeeper.ZooKeeper)
[2021-04-07 15:46:31,165] INFO EventThread shut down for session: 0x10000017c010377 (org.apache.zookeeper.ClientCnxn)
[2021-04-07 15:46:31,168] INFO [ZooKeeperClient Kafka server] Closed. (kafka.zookeeper.ZooKeeperClient)
[2021-04-07 15:46:31,180] INFO App info kafka.server for 0 unregistered (org.apache.kafka.common.utils.AppInfoParser)
[2021-04-07 15:46:31,182] INFO shut down completed (kafka.server.KafkaServer)
[2021-04-07 15:46:31,183] ERROR Exiting Kafka. (kafka.server.KafkaServerStartable)
[2021-04-07 15:46:31,186] INFO shutting down (kafka.server.KafkaServer)

The error in brief:

The Cluster ID mtrImbVDTm-jUIEpt8-jZQ doesn't match stored clusterId Some(OivcewOYSn2m-mUMUMMJUw) in meta.properties. The broker is trying to join the wrong cluster. Configured zookeeper.connect may be wrong.

This seems to be related to a check introduced in Kafka 2.4.0:

Again, the various suggestions for fixing this problem don't really sound great in a k8s world, but are possible:

Let me know if you'd like to see anything else.

carrodher commented 3 years ago

The issue is not reproducible (at least for me) when using the bundled zookeeper Helm Chart (included as subchart and enabled by default), right? According to your tests, it only happens when using an external zookeeper.

It's definitely something that should work, if you have any idea about how to solve the issue (at Helm Chart level or container image level) feel free to contribute by creating a PR if you have something on your mind. In another case, I will create an internal task in order to investigate the issue in a proper way

stackedsax commented 3 years ago

@carrodher Correct, using the internal Zookeeper works just fine. It's just when using an external zookeeper that the problem occurs.

All my ideas about how to fix the problem seem like bad ideas to me. It may be a couple of weeks before I can come back to this and see if there's a more elegant solution than the suggestions posted in those serverfault and stackoverflow issues, but I'll let you know when I do.

The issue, I think, with the suggestions from those two sites is that the workflow seems to be:

That might work when you have a static, standalone kafka cluster that is operated by hand. But if you are running on k8s and your kafka containers get restarted as soon as there is an error, you don't get a chance to delete those files.

Perhaps it will work to delete meta.properties or logs/ before kafka is first started, but it's curious to me why none of the suggestions on those two sites suggest that you can solve the issue before you spin up kafka the first time. It may simply be that they're solving the OP's original question: "how do I fix an already broken kafka cluster?" At the same time, I have a feeling that the answer won't be so straightforward and that there are some parts of that first initialization of kafka that need to happen before the meta.properties or logs/ can be deleted and the cluster can be started happily.

Still, I'd probably start by trying to delete one of these things before the initial startup just to see if that would be a quick solution to the problem. If it isn't, then I imagine we'll have to dive a little deeper into where kafka gets the Cluster ID from and whether it can accept a value for that parameter on startup.

carrodher commented 3 years ago

Thanks for the detailed explanation and your time, let's wait to see if there is a bit more clarity about this topic or a more mature solution for k8s environments

github-actions[bot] commented 3 years ago

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

stackedsax commented 3 years ago

@carrodher what kind of clarity are you looking for on the subject and from whom? Should we post something in Kafka's repo to get a conversation started?

carrodher commented 3 years ago

I am not sure how to proceed in the sense that I don't see what can be improved in the chart itself that is what we have control over it. I was waiting for other users reporting the same issue or something like that, but yes, as it seems there is no activity in this thread maybe it is something to be asked in the Kafka repository just to see if there is more light or any hint

github-actions[bot] commented 3 years ago

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.

FraBle commented 3 years ago

I'm running into the same issue. Maybe I can help to reproduce it.

I'm having the following dependencies declared in my chart:

dependencies:
  - name: kafka
    version: 13.0.2
    repository: https://charts.bitnami.com/bitnami
    alias: pubsub
  - name: zookeeper
    version: 7.0.4
    repository: https://charts.bitnami.com/bitnami
    alias: pubsub-zk

My values.yaml contains:

pubsub-zk:
  fullnameOverride: bullet-pubsub-zk
pubsub:
  zookeeper:
    enabled: false
  externalZookeeper:
    servers:
      - bullet-pubsub-zk

I'm trying this setup since my chart contains another Kafka-Zookeeper combo and I want to keep them separate.

It fails with:

[2021-06-30 06:36:04,062] INFO EventThread shut down for session: 0x10005753b1d0004 (org.apache.zookeeper.ClientCnxn)
[2021-06-30 06:36:04,065] INFO [ZooKeeperClient Kafka server] Closed. (kafka.zookeeper.ZooKeeperClient)
[2021-06-30 06:36:04,081] INFO App info kafka.server for 0 unregistered (org.apache.kafka.common.utils.AppInfoParser)
[2021-06-30 06:36:04,081] INFO shut down completed (kafka.server.KafkaServer)
[2021-06-30 06:36:04,081] ERROR Exiting Kafka. (kafka.Kafka$)
[2021-06-30 06:36:04,082] INFO shutting down (kafka.server.KafkaServer)
kafka 06:38:53.45 
kafka 06:38:53.45 Welcome to the Bitnami kafka container
kafka 06:38:53.45 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-kafka
kafka 06:38:53.45 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-kafka/issues
kafka 06:38:53.45 
kafka 06:38:53.46 INFO  ==> ** Starting Kafka setup **
kafka 06:38:53.49 WARN  ==> You set the environment variable ALLOW_PLAINTEXT_LISTENER=yes. For safety reasons, do not use this flag in a production environment.
kafka 06:38:53.50 INFO  ==> Initializing Kafka...
kafka 06:38:53.50 INFO  ==> No injected configuration files found, creating default config files
kafka 06:38:53.63 INFO  ==> Configuring Kafka for inter-broker communications with PLAINTEXT authentication.
kafka 06:38:53.63 WARN  ==> Inter-broker communications are configured as PLAINTEXT. This is not safe for production environments.
kafka 06:38:53.63 INFO  ==> Configuring Kafka for client communications with PLAINTEXT authentication.
kafka 06:38:53.63 WARN  ==> Client communications are configured using PLAINTEXT listeners. For safety reasons, do not use this in a production environment.
kafka 06:38:53.65 INFO  ==> ** Kafka setup finished! **

kafka 06:38:53.66 INFO  ==> ** Starting Kafka **
[2021-06-30 06:38:54,328] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
[2021-06-30 06:38:54,706] INFO Setting -D jdk.tls.rejectClientInitiatedRenegotiation=true to disable client-initiated TLS renegotiation (org.apache.zookeeper.common.X509Util)
[2021-06-30 06:38:54,781] INFO Registered signal handlers for TERM, INT, HUP (org.apache.kafka.common.utils.LoggingSignalHandler)
[2021-06-30 06:38:54,784] INFO starting (kafka.server.KafkaServer)
[2021-06-30 06:38:54,784] INFO Connecting to zookeeper on bullet-pubsub-zk (kafka.server.KafkaServer)
[2021-06-30 06:38:54,796] INFO [ZooKeeperClient Kafka server] Initializing a new session to bullet-pubsub-zk. (kafka.zookeeper.ZooKeeperClient)
[2021-06-30 06:38:54,800] INFO Client environment:zookeeper.version=3.5.9-83df9301aa5c2a5d284a9940177808c01bc35cef, built on 01/06/2021 20:03 GMT (org.apache.zookeeper.ZooKeeper)
[2021-06-30 06:38:54,800] INFO Client environment:host.name=bullet-pubsub-0.bullet-pubsub-headless.default.svc.cluster.local (org.apache.zookeeper.ZooKeeper)
[2021-06-30 06:38:54,800] INFO Client environment:java.version=11.0.11 (org.apache.zookeeper.ZooKeeper)
[2021-06-30 06:38:54,800] INFO Client environment:java.vendor=BellSoft (org.apache.zookeeper.ZooKeeper)
[2021-06-30 06:38:54,800] INFO Client environment:java.home=/opt/bitnami/java (org.apache.zookeeper.ZooKeeper)
[2021-06-30 06:38:54,800] INFO Client environment:java.class.path=/opt/bitnami/kafka/bin/../libs/activation-1.1.1.jar:/opt/bitnami/kafka/bin/../libs/aopalliance-repackaged-2.6.1.jar:/opt/bitnami/kafka/bin/../libs/argparse4j-0.7.0.jar:/opt/bitnami/kafka/bin/../libs/audience-annotations-0.5.0.jar:/opt/bitnami/kafka/bin/../libs/commons-cli-1.4.jar:/opt/bitnami/kafka/bin/../libs/commons-lang3-3.8.1.jar:/opt/bitnami/kafka/bin/../libs/connect-api-2.8.0.jar:/opt/bitnami/kafka/bin/../libs/connect-basic-auth-extension-2.8.0.jar:/opt/bitnami/kafka/bin/../libs/connect-file-2.8.0.jar:/opt/bitnami/kafka/bin/../libs/connect-json-2.8.0.jar:/opt/bitnami/kafka/bin/../libs/connect-mirror-2.8.0.jar:/opt/bitnami/kafka/bin/../libs/connect-mirror-client-2.8.0.jar:/opt/bitnami/kafka/bin/../libs/connect-runtime-2.8.0.jar:/opt/bitnami/kafka/bin/../libs/connect-transforms-2.8.0.jar:/opt/bitnami/kafka/bin/../libs/hk2-api-2.6.1.jar:/opt/bitnami/kafka/bin/../libs/hk2-locator-2.6.1.jar:/opt/bitnami/kafka/bin/../libs/hk2-utils-2.6.1.jar:/opt/bitnami/kafka/bin/../libs/jackson-annotations-2.10.5.jar:/opt/bitnami/kafka/bin/../libs/jackson-core-2.10.5.jar:/opt/bitnami/kafka/bin/../libs/jackson-databind-2.10.5.1.jar:/opt/bitnami/kafka/bin/../libs/jackson-dataformat-csv-2.10.5.jar:/opt/bitnami/kafka/bin/../libs/jackson-datatype-jdk8-2.10.5.jar:/opt/bitnami/kafka/bin/../libs/jackson-jaxrs-base-2.10.5.jar:/opt/bitnami/kafka/bin/../libs/jackson-jaxrs-json-provider-2.10.5.jar:/opt/bitnami/kafka/bin/../libs/jackson-module-jaxb-annotations-2.10.5.jar:/opt/bitnami/kafka/bin/../libs/jackson-module-paranamer-2.10.5.jar:/opt/bitnami/kafka/bin/../libs/jackson-module-scala_2.12-2.10.5.jar:/opt/bitnami/kafka/bin/../libs/jakarta.activation-api-1.2.1.jar:/opt/bitnami/kafka/bin/../libs/jakarta.annotation-api-1.3.5.jar:/opt/bitnami/kafka/bin/../libs/jakarta.inject-2.6.1.jar:/opt/bitnami/kafka/bin/../libs/jakarta.validation-api-2.0.2.jar:/opt/bitnami/kafka/bin/../libs/jakarta.ws.rs-api-2.1.6.jar:/opt/bitnami/kafka/bin/../libs/jakarta.xml.bind-api-2.3.2.jar:/opt/bitnami/kafka/bin/../libs/javassist-3.27.0-GA.jar:/opt/bitnami/kafka/bin/../libs/javax.servlet-api-3.1.0.jar:/opt/bitnami/kafka/bin/../libs/javax.ws.rs-api-2.1.1.jar:/opt/bitnami/kafka/bin/../libs/jaxb-api-2.3.0.jar:/opt/bitnami/kafka/bin/../libs/jersey-client-2.31.jar:/opt/bitnami/kafka/bin/../libs/jersey-common-2.31.jar:/opt/bitnami/kafka/bin/../libs/jersey-container-servlet-2.31.jar:/opt/bitnami/kafka/bin/../libs/jersey-container-servlet-core-2.31.jar:/opt/bitnami/kafka/bin/../libs/jersey-hk2-2.31.jar:/opt/bitnami/kafka/bin/../libs/jersey-media-jaxb-2.31.jar:/opt/bitnami/kafka/bin/../libs/jersey-server-2.31.jar:/opt/bitnami/kafka/bin/../libs/jetty-client-9.4.39.v20210325.jar:/opt/bitnami/kafka/bin/../libs/jetty-continuation-9.4.39.v20210325.jar:/opt/bitnami/kafka/bin/../libs/jetty-http-9.4.39.v20210325.jar:/opt/bitnami/kafka/bin/../libs/jetty-io-9.4.39.v20210325.jar:/opt/bitnami/kafka/bin/../libs/jetty-security-9.4.39.v20210325.jar:/opt/bitnami/kafka/bin/../libs/jetty-server-9.4.39.v20210325.jar:/opt/bitnami/kafka/bin/../libs/jetty-servlet-9.4.39.v20210325.jar:/opt/bitnami/kafka/bin/../libs/jetty-servlets-9.4.39.v20210325.jar:/opt/bitnami/kafka/bin/../libs/jetty-util-9.4.39.v20210325.jar:/opt/bitnami/kafka/bin/../libs/jetty-util-ajax-9.4.39.v20210325.jar:/opt/bitnami/kafka/bin/../libs/jline-3.12.1.jar:/opt/bitnami/kafka/bin/../libs/jopt-simple-5.0.4.jar:/opt/bitnami/kafka/bin/../libs/kafka-clients-2.8.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-log4j-appender-2.8.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-metadata-2.8.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-raft-2.8.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-shell-2.8.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-streams-2.8.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-streams-examples-2.8.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-streams-scala_2.12-2.8.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-streams-test-utils-2.8.0.jar:/opt/bitnami/kafka/bin/../libs/kafka-tools-2.8.0.jar:/opt/bitnami/kafka/bin/../libs/kafka_2.12-2.8.0-sources.jar:/opt/bitnami/kafka/bin/../libs/kafka_2.12-2.8.0.jar:/opt/bitnami/kafka/bin/../libs/log4j-1.2.17.jar:/opt/bitnami/kafka/bin/../libs/lz4-java-1.7.1.jar:/opt/bitnami/kafka/bin/../libs/maven-artifact-3.6.3.jar:/opt/bitnami/kafka/bin/../libs/metrics-core-2.2.0.jar:/opt/bitnami/kafka/bin/../libs/netty-buffer-4.1.62.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-codec-4.1.62.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-common-4.1.62.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-handler-4.1.62.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-resolver-4.1.62.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-transport-4.1.62.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-transport-native-epoll-4.1.62.Final.jar:/opt/bitnami/kafka/bin/../libs/netty-transport-native-unix-common-4.1.62.Final.jar:/opt/bitnami/kafka/bin/../libs/osgi-resource-locator-1.0.3.jar:/opt/bitnami/kafka/bin/../libs/paranamer-2.8.jar:/opt/bitnami/kafka/bin/../libs/plexus-utils-3.2.1.jar:/opt/bitnami/kafka/bin/../libs/reflections-0.9.12.jar:/opt/bitnami/kafka/bin/../libs/rocksdbjni-5.18.4.jar:/opt/bitnami/kafka/bin/../libs/scala-collection-compat_2.12-2.3.0.jar:/opt/bitnami/kafka/bin/../libs/scala-java8-compat_2.12-0.9.1.jar:/opt/bitnami/kafka/bin/../libs/scala-library-2.12.13.jar:/opt/bitnami/kafka/bin/../libs/scala-logging_2.12-3.9.2.jar:/opt/bitnami/kafka/bin/../libs/scala-reflect-2.12.13.jar:/opt/bitnami/kafka/bin/../libs/slf4j-api-1.7.30.jar:/opt/bitnami/kafka/bin/../libs/slf4j-log4j12-1.7.30.jar:/opt/bitnami/kafka/bin/../libs/snappy-java-1.1.8.1.jar:/opt/bitnami/kafka/bin/../libs/zookeeper-3.5.9.jar:/opt/bitnami/kafka/bin/../libs/zookeeper-jute-3.5.9.jar:/opt/bitnami/kafka/bin/../libs/zstd-jni-1.4.9-1.jar (org.apache.zookeeper.ZooKeeper)
[2021-06-30 06:38:54,800] INFO Client environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper)
[2021-06-30 06:38:54,800] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper)
[2021-06-30 06:38:54,800] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper)
[2021-06-30 06:38:54,800] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper)
[2021-06-30 06:38:54,800] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper)
[2021-06-30 06:38:54,800] INFO Client environment:os.version=5.4.72-microsoft-standard-WSL2 (org.apache.zookeeper.ZooKeeper)
[2021-06-30 06:38:54,800] INFO Client environment:user.name=? (org.apache.zookeeper.ZooKeeper)
[2021-06-30 06:38:54,801] INFO Client environment:user.home=? (org.apache.zookeeper.ZooKeeper)
[2021-06-30 06:38:54,801] INFO Client environment:user.dir=/ (org.apache.zookeeper.ZooKeeper)
[2021-06-30 06:38:54,801] INFO Client environment:os.memory.free=1011MB (org.apache.zookeeper.ZooKeeper)
[2021-06-30 06:38:54,801] INFO Client environment:os.memory.max=1024MB (org.apache.zookeeper.ZooKeeper)
[2021-06-30 06:38:54,801] INFO Client environment:os.memory.total=1024MB (org.apache.zookeeper.ZooKeeper)
[2021-06-30 06:38:54,802] INFO Initiating client connection, connectString=bullet-pubsub-zk sessionTimeout=18000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@40844aab (org.apache.zookeeper.ZooKeeper)
[2021-06-30 06:38:54,805] INFO jute.maxbuffer value is 4194304 Bytes (org.apache.zookeeper.ClientCnxnSocket)
[2021-06-30 06:38:54,809] INFO zookeeper.request.timeout value is 0. feature enabled= (org.apache.zookeeper.ClientCnxn)
[2021-06-30 06:38:54,810] INFO [ZooKeeperClient Kafka server] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2021-06-30 06:38:54,816] INFO Opening socket connection to server bullet-pubsub-zk/10.97.146.139:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2021-06-30 06:38:54,821] INFO Socket connection established, initiating session, client: /10.1.2.224:59832, server: bullet-pubsub-zk/10.97.146.139:2181 (org.apache.zookeeper.ClientCnxn)
[2021-06-30 06:38:54,832] INFO Session establishment complete on server bullet-pubsub-zk/10.97.146.139:2181, sessionid = 0x10005753b1d0005, negotiated timeout = 18000 (org.apache.zookeeper.ClientCnxn)
[2021-06-30 06:38:54,835] INFO [ZooKeeperClient Kafka server] Connected. (kafka.zookeeper.ZooKeeperClient)
[2021-06-30 06:38:54,926] INFO [feature-zk-node-event-process-thread]: Starting (kafka.server.FinalizedFeatureChangeListener$ChangeNotificationProcessorThread)
[2021-06-30 06:38:54,933] INFO Feature ZK node at path: /feature does not exist (kafka.server.FinalizedFeatureChangeListener)
[2021-06-30 06:38:54,934] INFO Cleared cache (kafka.server.FinalizedFeatureCache)
[2021-06-30 06:38:55,024] INFO Cluster ID = EmgULHbCREO_zH9-XpLmwg (kafka.server.KafkaServer)
[2021-06-30 06:38:55,031] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.common.InconsistentClusterIdException: The Cluster ID EmgULHbCREO_zH9-XpLmwg doesn't match stored clusterId Some(TLc5YVRBTlC0oSpURr6OvA) in meta.properties. The broker is trying to join the wrong cluster. Configured zookeeper.connect may be wrong.
    at kafka.server.KafkaServer.startup(KafkaServer.scala:218)
    at kafka.Kafka$.main(Kafka.scala:109)
    at kafka.Kafka.main(Kafka.scala)
[2021-06-30 06:38:55,033] INFO shutting down (kafka.server.KafkaServer)
carrodher commented 3 years ago

According to https://github.com/bitnami/charts/tree/master/bitnami/kafka#zookeeper-chart-parameters, it seems the parameters are fine.

According to the error

[2021-06-30 06:38:55,031] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.common.InconsistentClusterIdException: The Cluster ID EmgULHbCREO_zH9-XpLmwg doesn't match stored clusterId Some(TLc5YVRBTlC0oSpURr6OvA) in meta.properties. The broker is trying to join the wrong cluster. Configured zookeeper.connect may be wrong.

it seems there is any kind of discrepancy between the ID stored and the one used to join the cluster. Can you try in a different namespace or using a different name for the deployment? Also ensuring there are no previous PV/PVCs will help, just in case there is some information persisted from previous deployments that is generating a conflict with the new one.

In the same way, can you add more traces to the Kafka container? Let's see if more useful information appears, you can do it by adding the following parameter to your pubsub block:

pubsub:
  image:
    debug: true
stackedsax commented 3 years ago

@FraBle Like @carrodher suggests, check to make sure that you delete the pv's and pvc's when you delete the chart deployment. This was my problem all along. Once I cleared out the namespace entirely and retried from scratch, it became clear that this was the problem.

Thanks for posting so that I remembered to come back and share what I was doing wrong!

FraBle commented 3 years ago

For sure! Will try out some things tonight or tomorrow and report back.

FraBle commented 3 years ago

Yep, I can confirm the pvc and pv were still around and reusing the same name associated them with the new helm release.

Related helm issue: https://github.com/helm/helm/issues/5156

Screenshot 2021-07-01 213709

arniesaha commented 2 years ago

Even we have been facing similar issues - when kafka version is upgraded, and at times even during rescaling kafka with the same external zookeeper config image

Although dropping the kafka pvc workaround does help bring this back up. But, wondering if there's a better solution i.e. while restarts or upgrades if meta.properties can be deleted via charts or something on the lines of being automated?

As dropping pvcs doesn't seem like an elegant long term fix with external zookeeper config

carrodher commented 2 years ago

I'm not totally sure if there is an easy solution for this since it would require to persist the whole config except the parts that are randomly generated after a restart... If you have a proper solution in mind feel free to create a PR, we will be happy to review it

danuamirudin commented 2 years ago

I have been solved that's an issue by adding

values:
  externalZookeeper: 
    servers: [ClusterIP-zookeeper:2181]