banzaicloud / koperator

Oh no! Yet another Apache Kafka operator for Kubernetes
Apache License 2.0
788 stars 198 forks source link

a kafka pod is constantly created and destroyed #112

Closed xiongmaodada closed 5 years ago

xiongmaodada commented 5 years ago

Describe the bug

intall steps

step 1 start k8s cluster by minikube:

minikube start --memory 4196 --cpus 2

step 2 install Zookeeper:

helm repo add banzaicloud-stable https://kubernetes-charts.banzaicloud.com/
helm install --name zookeeper-operator --namespace=zookeeper banzaicloud-stable/zookeeper-operator
kubectl create --namespace zookeeper -f - <<EOF
apiVersion: zookeeper.pravega.io/v1beta1
kind: ZookeeperCluster
metadata:
  name: example-zookeepercluster
  namespace: zookeeper
spec:
  replicas: 3
EOF

step3 minikube LoadBalancer:

kubectl run minikube-lb-patch --replicas=1 --image=elsonrodriguez/minikube-lb-patch:0.1 --namespace=kube-system

step 4 install kafka:

helm repo add banzaicloud-stable https://kubernetes-charts.banzaicloud.com/
helm install --name=kafka-operator --namespace=kafka banzaicloud-stable/kafka-operator -f config/samples/example-prometheus-alerts.yaml
# Add your zookeeper svc name to the configuration
kubectl create -n kafka -f config/samples/example-secret.yaml
kubectl create -n kafka -f config/samples/banzaicloud_v1alpha1_kafkacluster.yaml

the bug

a kafka pod status is initially Init:0/3, and is running after a while. the other kafka pod is init:0/3 after 35s,...repeatedly.

first pod kafka8w4h9 is Init:0/3 :

$ kubectl get po --all-namespaces
NAMESPACE     NAME                                               READY   STATUS     RESTARTS   AGE
kafka         envoy-849fdc9687-tnr7z                             1/1     Running    0          40m
kafka         kafka-cruisecontrol-7564f794d8-knrx2               1/1     Running    0          38m
kafka         kafka-operator-operator-0                          2/2     Running    0          42m
kafka         kafka-operator-prometheus-server-6bddb4cbb-br7x4   2/2     Running    0          42m
kafka         kafka8w4h9                                         0/1     Init:0/3   0          4s
kafka         kafkafs6wq                                         1/1     Running    0          39m
kafka         kafkatlm4k                                         1/1     Running    0          39m
kafka         kafkazpvqz                                         1/1     Running    0          39m
kube-system   coredns-5c98db65d4-6z9j6                           1/1     Running    0          130m
kube-system   coredns-5c98db65d4-8lwz9                           1/1     Running    0          130m
kube-system   etcd-minikube                                      1/1     Running    0          129m
kube-system   kube-addon-manager-minikube                        1/1     Running    0          129m
kube-system   kube-apiserver-minikube                            1/1     Running    0          129m
kube-system   kube-controller-manager-minikube                   1/1     Running    0          129m
kube-system   kube-proxy-jkqxz                                   1/1     Running    0          130m
kube-system   kube-scheduler-minikube                            1/1     Running    0          129m
kube-system   minikube-lb-patch-6f6db8bccc-jr6nz                 1/1     Running    0          113m
kube-system   storage-provisioner                                1/1     Running    0          130m
kube-system   tiller-deploy-75f6c87b87-44w5s                     1/1     Running    0          127m
zookeeper     example-zookeepercluster-0                         1/1     Running    0          125m
zookeeper     example-zookeepercluster-1                         1/1     Running    0          125m
zookeeper     example-zookeepercluster-2                         1/1     Running    0          124m
zookeeper     zookeeper-operator-65d86d6674-wjjgj                1/1     Running    0          126m

the fist pod kafka8w4h9 is running after a while

$ kubectl get po --all-namespaces
NAMESPACE     NAME                                               READY   STATUS    RESTARTS   AGE
kafka         envoy-849fdc9687-tnr7z                             1/1     Running   0          40m
kafka         kafka-cruisecontrol-7564f794d8-knrx2               1/1     Running   0          39m
kafka         kafka-operator-operator-0                          2/2     Running   0          42m
kafka         kafka-operator-prometheus-server-6bddb4cbb-br7x4   2/2     Running   0          42m
kafka         kafka8w4h9                                         1/1     Running   0          24s
kafka         kafkafs6wq                                         1/1     Running   0          40m
kafka         kafkatlm4k                                         1/1     Running   0          40m
kafka         kafkazpvqz                                         1/1     Running   0          40m
kube-system   coredns-5c98db65d4-6z9j6                           1/1     Running   0          131m
kube-system   coredns-5c98db65d4-8lwz9                           1/1     Running   0          131m
kube-system   etcd-minikube                                      1/1     Running   0          130m
kube-system   kube-addon-manager-minikube                        1/1     Running   0          129m
kube-system   kube-apiserver-minikube                            1/1     Running   0          130m
kube-system   kube-controller-manager-minikube                   1/1     Running   0          129m
kube-system   kube-proxy-jkqxz                                   1/1     Running   0          131m
kube-system   kube-scheduler-minikube                            1/1     Running   0          129m
kube-system   minikube-lb-patch-6f6db8bccc-jr6nz                 1/1     Running   0          113m
kube-system   storage-provisioner                                1/1     Running   0          131m
kube-system   tiller-deploy-75f6c87b87-44w5s                     1/1     Running   0          127m
zookeeper     example-zookeepercluster-0                         1/1     Running   0          126m
zookeeper     example-zookeepercluster-1                         1/1     Running   0          125m
zookeeper     example-zookeepercluster-2                         1/1     Running   0          125m
zookeeper     zookeeper-operator-65d86d6674-wjjgj                1/1     Running   0          126m

the fist pod kafka8w4h9 disappear and the pod kafkadbct9 is Init:0/3:

$ kubectl get po --all-namespaces
NAMESPACE     NAME                                               READY   STATUS     RESTARTS   AGE
kafka         envoy-849fdc9687-tnr7z                             1/1     Running    0          40m
kafka         kafka-cruisecontrol-7564f794d8-knrx2               1/1     Running    0          39m
kafka         kafka-operator-operator-0                          2/2     Running    0          42m
kafka         kafka-operator-prometheus-server-6bddb4cbb-br7x4   2/2     Running    0          42m
kafka         kafkadbct9                                         0/1     Init:0/3   0          2s
kafka         kafkafs6wq                                         1/1     Running    0          40m
kafka         kafkatlm4k                                         1/1     Running    0          40m
kafka         kafkazpvqz                                         1/1     Running    0          40m
kube-system   coredns-5c98db65d4-6z9j6                           1/1     Running    0          131m
kube-system   coredns-5c98db65d4-8lwz9                           1/1     Running    0          131m
kube-system   etcd-minikube                                      1/1     Running    0          130m
kube-system   kube-addon-manager-minikube                        1/1     Running    0          130m
kube-system   kube-apiserver-minikube                            1/1     Running    0          130m
kube-system   kube-controller-manager-minikube                   1/1     Running    0          130m
kube-system   kube-proxy-jkqxz                                   1/1     Running    0          131m
kube-system   kube-scheduler-minikube                            1/1     Running    0          130m
kube-system   minikube-lb-patch-6f6db8bccc-jr6nz                 1/1     Running    0          113m
kube-system   storage-provisioner                                1/1     Running    0          131m
kube-system   tiller-deploy-75f6c87b87-44w5s                     1/1     Running    0          127m
zookeeper     example-zookeepercluster-0                         1/1     Running    0          126m
zookeeper     example-zookeepercluster-1                         1/1     Running    0          126m
zookeeper     example-zookeepercluster-2                         1/1     Running    0          125m
zookeeper     zookeeper-operator-65d86d6674-wjjgj                1/1     Running    0          126m

this process is constantly repeating.

the fist pod kafka8w4h9 log:

$ kubectl -n kafka logs  kafka8w4h9 -f
[2019-08-28 10:09:21,360] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
[2019-08-28 10:09:27,751] INFO starting (kafka.server.KafkaServer)
[2019-08-28 10:09:27,753] INFO Connecting to zookeeper on example-zookeepercluster-client.zookeeper:2181 (kafka.server.KafkaServer)
[2019-08-28 10:09:27,961] INFO [ZooKeeperClient] Initializing a new session to example-zookeepercluster-client.zookeeper:2181. (kafka.zookeeper.ZooKeeperClient)
[2019-08-28 10:09:27,969] INFO Client environment:zookeeper.version=3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on 06/29/2018 00:39 GMT (org.apache.zookeeper.ZooKeeper)
[2019-08-28 10:09:27,970] INFO Client environment:host.name=kafka-0.kafka-headless.kafka.svc.cluster.local (org.apache.zookeeper.ZooKeeper)
[2019-08-28 10:09:27,970] INFO Client environment:java.version=1.8.0_191 (org.apache.zookeeper.ZooKeeper)
[2019-08-28 10:09:27,970] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper)
[2019-08-28 10:09:27,970] INFO Client environment:java.home=/usr/lib/jvm/java-1.8-openjdk/jre (org.apache.zookeeper.ZooKeeper)
[2019-08-28 10:09:27,970] INFO Client environment:java.class.path=/opt/kafka/libs/extensions/cruise-control-metrics-reporter.jar:/opt/kafka/bin/../libs/activation-1.1.1.jar:/opt/kafka/bin/../libs/aopalliance-repackaged-2.5.0-b42.jar:/opt/kafka/bin/../libs/argparse4j-0.7.0.jar:/opt/kafka/bin/../libs/audience-annotations-0.5.0.jar:/opt/kafka/bin/../libs/commons-lang3-3.5.jar:/opt/kafka/bin/../libs/compileScala.mapping:/opt/kafka/bin/../libs/compileScala.mapping.asc:/opt/kafka/bin/../libs/connect-api-2.1.0.jar:/opt/kafka/bin/../libs/connect-basic-auth-extension-2.1.0.jar:/opt/kafka/bin/../libs/connect-file-2.1.0.jar:/opt/kafka/bin/../libs/connect-json-2.1.0.jar:/opt/kafka/bin/../libs/connect-runtime-2.1.0.jar:/opt/kafka/bin/../libs/connect-transforms-2.1.0.jar:/opt/kafka/bin/../libs/extensions:/opt/kafka/bin/../libs/guava-20.0.jar:/opt/kafka/bin/../libs/hk2-api-2.5.0-b42.jar:/opt/kafka/bin/../libs/hk2-locator-2.5.0-b42.jar:/opt/kafka/bin/../libs/hk2-utils-2.5.0-b42.jar:/opt/kafka/bin/../libs/jackson-annotations-2.9.7.jar:/opt/kafka/bin/../libs/jackson-core-2.9.7.jar:/opt/kafka/bin/../libs/jackson-databind-2.9.7.jar:/opt/kafka/bin/../libs/jackson-jaxrs-base-2.9.7.jar:/opt/kafka/bin/../libs/jackson-jaxrs-json-provider-2.9.7.jar:/opt/kafka/bin/../libs/jackson-module-jaxb-annotations-2.9.7.jar:/opt/kafka/bin/../libs/javassist-3.22.0-CR2.jar:/opt/kafka/bin/../libs/javax.annotation-api-1.2.jar:/opt/kafka/bin/../libs/javax.inject-1.jar:/opt/kafka/bin/../libs/javax.inject-2.5.0-b42.jar:/opt/kafka/bin/../libs/javax.servlet-api-3.1.0.jar:/opt/kafka/bin/../libs/javax.ws.rs-api-2.1.1.jar:/opt/kafka/bin/../libs/javax.ws.rs-api-2.1.jar:/opt/kafka/bin/../libs/jaxb-api-2.3.0.jar:/opt/kafka/bin/../libs/jersey-client-2.27.jar:/opt/kafka/bin/../libs/jersey-common-2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet-2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet-core-2.27.jar:/opt/kafka/bin/../libs/jersey-hk2-2.27.jar:/opt/kafka/bin/../libs/jersey-media-jaxb-2.27.jar:/opt/kafka/bin/../libs/jersey-server-2.27.jar:/opt/kafka/bin/../libs/jetty-client-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-continuation-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-http-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-io-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-security-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-server-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-servlet-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-servlets-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jetty-util-9.4.12.v20180830.jar:/opt/kafka/bin/../libs/jopt-simple-5.0.4.jar:/opt/kafka/bin/../libs/kafka-clients-2.1.0.jar:/opt/kafka/bin/../libs/kafka-log4j-appender-2.1.0.jar:/opt/kafka/bin/../libs/kafka-streams-2.1.0.jar:/opt/kafka/bin/../libs/kafka-streams-examples-2.1.0.jar:/opt/kafka/bin/../libs/kafka-streams-scala_2.12-2.1.0.jar:/opt/kafka/bin/../libs/kafka-streams-test-utils-2.1.0.jar:/opt/kafka/bin/../libs/kafka-tools-2.1.0.jar:/opt/kafka/bin/../libs/kafka_2.12-2.1.0-sources.jar:/opt/kafka/bin/../libs/kafka_2.12-2.1.0.jar:/opt/kafka/bin/../libs/log4j-1.2.17.jar:/opt/kafka/bin/../libs/lz4-java-1.5.0.jar:/opt/kafka/bin/../libs/maven-artifact-3.5.4.jar:/opt/kafka/bin/../libs/metrics-core-2.2.0.jar:/opt/kafka/bin/../libs/osgi-resource-locator-1.0.1.jar:/opt/kafka/bin/../libs/plexus-utils-3.1.0.jar:/opt/kafka/bin/../libs/reflections-0.9.11.jar:/opt/kafka/bin/../libs/rocksdbjni-5.14.2.jar:/opt/kafka/bin/../libs/scala-library-2.12.7.jar:/opt/kafka/bin/../libs/scala-logging_2.12-3.9.0.jar:/opt/kafka/bin/../libs/scala-reflect-2.12.7.jar:/opt/kafka/bin/../libs/slf4j-api-1.7.25.jar:/opt/kafka/bin/../libs/slf4j-log4j12-1.7.25.jar:/opt/kafka/bin/../libs/snappy-java-1.1.7.2.jar:/opt/kafka/bin/../libs/validation-api-1.1.0.Final.jar:/opt/kafka/bin/../libs/zkclient-0.10.jar:/opt/kafka/bin/../libs/zookeeper-3.4.13.jar:/opt/kafka/bin/../libs/zstd-jni-1.3.5-4.jar:/opt/jmx-exporter/jmx_prometheus.jar (org.apache.zookeeper.ZooKeeper)
[2019-08-28 10:09:27,971] INFO Client environment:java.library.path=/usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64/server:/usr/lib/jvm/java-1.8-openjdk/jre/lib/amd64:/usr/lib/jvm/java-1.8-openjdk/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper)
[2019-08-28 10:09:27,971] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper)
[2019-08-28 10:09:27,971] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper)
[2019-08-28 10:09:27,971] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper)
[2019-08-28 10:09:27,972] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper)
[2019-08-28 10:09:27,972] INFO Client environment:os.version=4.15.0 (org.apache.zookeeper.ZooKeeper)
[2019-08-28 10:09:27,972] INFO Client environment:user.name=root (org.apache.zookeeper.ZooKeeper)
[2019-08-28 10:09:27,973] INFO Client environment:user.home=/root (org.apache.zookeeper.ZooKeeper)
[2019-08-28 10:09:27,973] INFO Client environment:user.dir=/ (org.apache.zookeeper.ZooKeeper)
[2019-08-28 10:09:28,050] INFO Initiating client connection, connectString=example-zookeepercluster-client.zookeeper:2181 sessionTimeout=6000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@37271612 (org.apache.zookeeper.ZooKeeper)
[2019-08-28 10:09:28,066] INFO [ZooKeeperClient] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2019-08-28 10:09:28,160] INFO Opening socket connection to server example-zookeepercluster-client.zookeeper/10.108.171.112:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2019-08-28 10:09:28,251] INFO Socket connection established to example-zookeepercluster-client.zookeeper/10.108.171.112:2181, initiating session (org.apache.zookeeper.ClientCnxn)
[2019-08-28 10:09:28,261] INFO Session establishment complete on server example-zookeepercluster-client.zookeeper/10.108.171.112:2181, sessionid = 0x300000744c401d4, negotiated timeout = 6000 (org.apache.zookeeper.ClientCnxn)
[2019-08-28 10:09:28,267] INFO [ZooKeeperClient] Connected. (kafka.zookeeper.ZooKeeperClient)
[2019-08-28 10:09:30,358] INFO Cluster ID = s5kRXw_MQFGtds5EU2xWbg (kafka.server.KafkaServer)
[2019-08-28 10:09:30,451] WARN No meta.properties file under dir /kafka-logs/kafka/meta.properties (kafka.server.BrokerMetadataCheckpoint)
[2019-08-28 10:09:31,158] INFO KafkaConfig values:
        advertised.host.name = null
        advertised.listeners = EXTERNAL://10.108.140.234:19090,SSL://kafka-0.kafka-headless.kafka.svc.cluster.local:29092
        advertised.port = null
        alter.config.policy.class.name = null
        alter.log.dirs.replication.quota.window.num = 11
        alter.log.dirs.replication.quota.window.size.seconds = 1
        authorizer.class.name =
        auto.create.topics.enable = false
        auto.leader.rebalance.enable = true
        background.threads = 10
        broker.id = 0
        broker.id.generation.enable = true
        broker.rack =
        client.quota.callback.class = null
        compression.type = producer
        connection.failed.authentication.delay.ms = 100
        connections.max.idle.ms = 600000
        controlled.shutdown.enable = true
        controlled.shutdown.max.retries = 3
        controlled.shutdown.retry.backoff.ms = 5000
        controller.socket.timeout.ms = 30000
        create.topic.policy.class.name = null
        default.replication.factor = 1
        delegation.token.expiry.check.interval.ms = 3600000
        delegation.token.expiry.time.ms = 86400000
        delegation.token.master.key = null
        delegation.token.max.lifetime.ms = 604800000
        delete.records.purgatory.purge.interval.requests = 1
        delete.topic.enable = true
        fetch.purgatory.purge.interval.requests = 1000
        group.initial.rebalance.delay.ms = 3000
        group.max.session.timeout.ms = 300000
        group.min.session.timeout.ms = 6000
        host.name =
        inter.broker.listener.name = null
        inter.broker.protocol.version = 2.1-IV2
        kafka.metrics.polling.interval.secs = 10
        kafka.metrics.reporters = []
        leader.imbalance.check.interval.seconds = 300
        leader.imbalance.per.broker.percentage = 10
        listener.security.protocol.map = SSL:SSL,EXTERNAL:SSL
        listeners = SSL://:29092,EXTERNAL://:9094
        log.cleaner.backoff.ms = 15000
        log.cleaner.dedupe.buffer.size = 134217728
        log.cleaner.delete.retention.ms = 86400000
        log.cleaner.enable = true
        log.cleaner.io.buffer.load.factor = 0.9
        log.cleaner.io.buffer.size = 524288
        log.cleaner.io.max.bytes.per.second = 1.7976931348623157E308
        log.cleaner.min.cleanable.ratio = 0.5
        log.cleaner.min.compaction.lag.ms = 0
        log.cleaner.threads = 1
        log.cleanup.policy = [delete]
        log.dir = /tmp/kafka-logs
        log.dirs = /kafka-logs/kafka
        log.flush.interval.messages = 9223372036854775807
        log.flush.interval.ms = null
        log.flush.offset.checkpoint.interval.ms = 60000
        log.flush.scheduler.interval.ms = 9223372036854775807
        log.flush.start.offset.checkpoint.interval.ms = 60000
        log.index.interval.bytes = 4096
        log.index.size.max.bytes = 10485760
        log.message.downconversion.enable = true
        log.message.format.version = 2.1-IV2
        log.message.timestamp.difference.max.ms = 9223372036854775807
        log.message.timestamp.type = CreateTime
        log.preallocate = false
        log.retention.bytes = -1
        log.retention.check.interval.ms = 300000
        log.retention.hours = 168
        log.retention.minutes = null
        log.retention.ms = null
        log.roll.hours = 168
        log.roll.jitter.hours = 0
        log.roll.jitter.ms = null
        log.roll.ms = null
        log.segment.bytes = 1073741824
        log.segment.delete.delay.ms = 60000
        max.connections.per.ip = 2147483647
        max.connections.per.ip.overrides =
        max.incremental.fetch.session.cache.slots = 1000
        message.max.bytes = 1000012
        metric.reporters = [com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter]
        metrics.num.samples = 2
        metrics.recording.level = INFO
        metrics.sample.window.ms = 30000
        min.insync.replicas = 1
        num.io.threads = 8
        num.network.threads = 3
        num.partitions = 1
        num.recovery.threads.per.data.dir = 1
        num.replica.alter.log.dirs.threads = null
        num.replica.fetchers = 1
        offset.metadata.max.bytes = 4096
        offsets.commit.required.acks = -1
        offsets.commit.timeout.ms = 5000
        offsets.load.buffer.size = 5242880
        offsets.retention.check.interval.ms = 600000
        offsets.retention.minutes = 10080
        offsets.topic.compression.codec = 0
        offsets.topic.num.partitions = 50
        offsets.topic.replication.factor = 3
        offsets.topic.segment.bytes = 104857600
        password.encoder.cipher.algorithm = AES/CBC/PKCS5Padding
        password.encoder.iterations = 4096
        password.encoder.key.length = 128
        password.encoder.keyfactory.algorithm = null
        password.encoder.old.secret = null
        password.encoder.secret = null
        port = 9092
        principal.builder.class = null
        producer.purgatory.purge.interval.requests = 1000
        queued.max.request.bytes = -1
        queued.max.requests = 500
        quota.consumer.default = 9223372036854775807
        quota.producer.default = 9223372036854775807
        quota.window.num = 11
        quota.window.size.seconds = 1
        replica.fetch.backoff.ms = 1000
        replica.fetch.max.bytes = 1048576
        replica.fetch.min.bytes = 1
        replica.fetch.response.max.bytes = 10485760
        replica.fetch.wait.max.ms = 500
        replica.high.watermark.checkpoint.interval.ms = 5000
        replica.lag.time.max.ms = 10000
        replica.socket.receive.buffer.bytes = 65536
        replica.socket.timeout.ms = 30000
        replication.quota.window.num = 11
        replication.quota.window.size.seconds = 1
        request.timeout.ms = 30000
        reserved.broker.max.id = 1000
        sasl.client.callback.handler.class = null
        sasl.enabled.mechanisms = [GSSAPI]
        sasl.jaas.config = null
        sasl.kerberos.kinit.cmd = /usr/bin/kinit
        sasl.kerberos.min.time.before.relogin = 60000
        sasl.kerberos.principal.to.local.rules = [DEFAULT]
        sasl.kerberos.service.name = null
        sasl.kerberos.ticket.renew.jitter = 0.05
        sasl.kerberos.ticket.renew.window.factor = 0.8
        sasl.login.callback.handler.class = null
        sasl.login.class = null
        sasl.login.refresh.buffer.seconds = 300
        sasl.login.refresh.min.period.seconds = 60
        sasl.login.refresh.window.factor = 0.8
        sasl.login.refresh.window.jitter = 0.05
        sasl.mechanism.inter.broker.protocol = GSSAPI
        sasl.server.callback.handler.class = null
        security.inter.broker.protocol = SSL
        socket.receive.buffer.bytes = 102400
        socket.request.max.bytes = 104857600
        socket.send.buffer.bytes = 102400
        ssl.cipher.suites = []
        ssl.client.auth = required
        ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
        ssl.endpoint.identification.algorithm = https
        ssl.key.password = null
        ssl.keymanager.algorithm = SunX509
        ssl.keystore.location = /var/run/secrets/java.io/keystores/kafka.server.keystore.jks
        ssl.keystore.password = [hidden]
        ssl.keystore.type = JKS
        ssl.protocol = TLS
        ssl.provider = null
        ssl.secure.random.implementation = null
        ssl.trustmanager.algorithm = PKIX
        ssl.truststore.location = /var/run/secrets/java.io/keystores/kafka.server.truststore.jks
        ssl.truststore.password = [hidden]
        ssl.truststore.type = JKS
        transaction.abort.timed.out.transaction.cleanup.interval.ms = 60000
        transaction.max.timeout.ms = 900000
        transaction.remove.expired.transaction.cleanup.interval.ms = 3600000
        transaction.state.log.load.buffer.size = 5242880
        transaction.state.log.min.isr = 2
        transaction.state.log.num.partitions = 50
        transaction.state.log.replication.factor = 3
        transaction.state.log.segment.bytes = 104857600
        transactional.id.expiration.ms = 604800000
        unclean.leader.election.enable = false
        zookeeper.connect = example-zookeepercluster-client.zookeeper:2181
        zookeeper.connection.timeout.ms = null
        zookeeper.max.in.flight.requests = 10
        zookeeper.session.timeout.ms = 6000
        zookeeper.set.acl = false
        zookeeper.sync.time.ms = 2000
 (kafka.server.KafkaConfig)
[2019-08-28 10:09:31,352] INFO KafkaConfig values:
        advertised.host.name = null
        advertised.listeners = EXTERNAL://10.108.140.234:19090,SSL://kafka-0.kafka-headless.kafka.svc.cluster.local:29092
        advertised.port = null
        alter.config.policy.class.name = null
        alter.log.dirs.replication.quota.window.num = 11
        alter.log.dirs.replication.quota.window.size.seconds = 1
        authorizer.class.name =
        auto.create.topics.enable = false
        auto.leader.rebalance.enable = true
        background.threads = 10
        broker.id = 0
        broker.id.generation.enable = true
        broker.rack =
        client.quota.callback.class = null
        compression.type = producer
        connection.failed.authentication.delay.ms = 100
        connections.max.idle.ms = 600000
        controlled.shutdown.enable = true
        controlled.shutdown.max.retries = 3
        controlled.shutdown.retry.backoff.ms = 5000
        controller.socket.timeout.ms = 30000
        create.topic.policy.class.name = null
        default.replication.factor = 1
        delegation.token.expiry.check.interval.ms = 3600000
        delegation.token.expiry.time.ms = 86400000
        delegation.token.master.key = null
        delegation.token.max.lifetime.ms = 604800000
        delete.records.purgatory.purge.interval.requests = 1
        delete.topic.enable = true
        fetch.purgatory.purge.interval.requests = 1000
        group.initial.rebalance.delay.ms = 3000
        group.max.session.timeout.ms = 300000
        group.min.session.timeout.ms = 6000
        host.name =
        inter.broker.listener.name = null
        inter.broker.protocol.version = 2.1-IV2
        kafka.metrics.polling.interval.secs = 10
        kafka.metrics.reporters = []
        leader.imbalance.check.interval.seconds = 300
        leader.imbalance.per.broker.percentage = 10
        listener.security.protocol.map = SSL:SSL,EXTERNAL:SSL
        listeners = SSL://:29092,EXTERNAL://:9094
        log.cleaner.backoff.ms = 15000
        log.cleaner.dedupe.buffer.size = 134217728
        log.cleaner.delete.retention.ms = 86400000
        log.cleaner.enable = true
        log.cleaner.io.buffer.load.factor = 0.9
        log.cleaner.io.buffer.size = 524288
        log.cleaner.io.max.bytes.per.second = 1.7976931348623157E308
        log.cleaner.min.cleanable.ratio = 0.5
        log.cleaner.min.compaction.lag.ms = 0
        log.cleaner.threads = 1
        log.cleanup.policy = [delete]
        log.dir = /tmp/kafka-logs
        log.dirs = /kafka-logs/kafka
        log.flush.interval.messages = 9223372036854775807
        log.flush.interval.ms = null
        log.flush.offset.checkpoint.interval.ms = 60000
        log.flush.scheduler.interval.ms = 9223372036854775807
        log.flush.start.offset.checkpoint.interval.ms = 60000
        log.index.interval.bytes = 4096
        log.index.size.max.bytes = 10485760
        log.message.downconversion.enable = true
        log.message.format.version = 2.1-IV2
        log.message.timestamp.difference.max.ms = 9223372036854775807
        log.message.timestamp.type = CreateTime
        log.preallocate = false
        log.retention.bytes = -1
        log.retention.check.interval.ms = 300000
        log.retention.hours = 168
        log.retention.minutes = null
        log.retention.ms = null
        log.roll.hours = 168
        log.roll.jitter.hours = 0
        log.roll.jitter.ms = null
        log.roll.ms = null
        log.segment.bytes = 1073741824
        log.segment.delete.delay.ms = 60000
        max.connections.per.ip = 2147483647
        max.connections.per.ip.overrides =
        max.incremental.fetch.session.cache.slots = 1000
        message.max.bytes = 1000012
        metric.reporters = [com.linkedin.kafka.cruisecontrol.metricsreporter.CruiseControlMetricsReporter]
        metrics.num.samples = 2
        metrics.recording.level = INFO
        metrics.sample.window.ms = 30000
        min.insync.replicas = 1
        num.io.threads = 8
        num.network.threads = 3
        num.partitions = 1
        num.recovery.threads.per.data.dir = 1
        num.replica.alter.log.dirs.threads = null
        num.replica.fetchers = 1
        offset.metadata.max.bytes = 4096
        offsets.commit.required.acks = -1
        offsets.commit.timeout.ms = 5000
        offsets.load.buffer.size = 5242880
        offsets.retention.check.interval.ms = 600000
        offsets.retention.minutes = 10080
        offsets.topic.compression.codec = 0
        offsets.topic.num.partitions = 50
        offsets.topic.replication.factor = 3
        offsets.topic.segment.bytes = 104857600
        password.encoder.cipher.algorithm = AES/CBC/PKCS5Padding
        password.encoder.iterations = 4096
        password.encoder.key.length = 128
        password.encoder.keyfactory.algorithm = null
        password.encoder.old.secret = null
        password.encoder.secret = null
        port = 9092
        principal.builder.class = null
        producer.purgatory.purge.interval.requests = 1000
        queued.max.request.bytes = -1
        queued.max.requests = 500
        quota.consumer.default = 9223372036854775807
        quota.producer.default = 9223372036854775807
        quota.window.num = 11
        quota.window.size.seconds = 1
        replica.fetch.backoff.ms = 1000
        replica.fetch.max.bytes = 1048576
        replica.fetch.min.bytes = 1
        replica.fetch.response.max.bytes = 10485760
        replica.fetch.wait.max.ms = 500
        replica.high.watermark.checkpoint.interval.ms = 5000
        replica.lag.time.max.ms = 10000
        replica.socket.receive.buffer.bytes = 65536
        replica.socket.timeout.ms = 30000
        replication.quota.window.num = 11
        replication.quota.window.size.seconds = 1
        request.timeout.ms = 30000
        reserved.broker.max.id = 1000
        sasl.client.callback.handler.class = null
        sasl.enabled.mechanisms = [GSSAPI]
        sasl.jaas.config = null
        sasl.kerberos.kinit.cmd = /usr/bin/kinit
        sasl.kerberos.min.time.before.relogin = 60000
        sasl.kerberos.principal.to.local.rules = [DEFAULT]
        sasl.kerberos.service.name = null
        sasl.kerberos.ticket.renew.jitter = 0.05
        sasl.kerberos.ticket.renew.window.factor = 0.8
        sasl.login.callback.handler.class = null
        sasl.login.class = null
        sasl.login.refresh.buffer.seconds = 300
        sasl.login.refresh.min.period.seconds = 60
        sasl.login.refresh.window.factor = 0.8
        sasl.login.refresh.window.jitter = 0.05
        sasl.mechanism.inter.broker.protocol = GSSAPI
        sasl.server.callback.handler.class = null
        security.inter.broker.protocol = SSL
        socket.receive.buffer.bytes = 102400
        socket.request.max.bytes = 104857600
        socket.send.buffer.bytes = 102400
        ssl.cipher.suites = []
        ssl.client.auth = required
        ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
        ssl.endpoint.identification.algorithm = https
        ssl.key.password = null
        ssl.keymanager.algorithm = SunX509
        ssl.keystore.location = /var/run/secrets/java.io/keystores/kafka.server.keystore.jks
        ssl.keystore.password = [hidden]
        ssl.keystore.type = JKS
        ssl.protocol = TLS
        ssl.provider = null
        ssl.secure.random.implementation = null
        ssl.trustmanager.algorithm = PKIX
        ssl.truststore.location = /var/run/secrets/java.io/keystores/kafka.server.truststore.jks
        ssl.truststore.password = [hidden]
        ssl.truststore.type = JKS
        transaction.abort.timed.out.transaction.cleanup.interval.ms = 60000
        transaction.max.timeout.ms = 900000
        transaction.remove.expired.transaction.cleanup.interval.ms = 3600000
        transaction.state.log.load.buffer.size = 5242880
        transaction.state.log.min.isr = 2
        transaction.state.log.num.partitions = 50
        transaction.state.log.replication.factor = 3
        transaction.state.log.segment.bytes = 104857600
        transactional.id.expiration.ms = 604800000
        unclean.leader.election.enable = false
        zookeeper.connect = example-zookeepercluster-client.zookeeper:2181
        zookeeper.connection.timeout.ms = null
        zookeeper.max.in.flight.requests = 10
        zookeeper.session.timeout.ms = 6000
        zookeeper.set.acl = false
        zookeeper.sync.time.ms = 2000
 (kafka.server.KafkaConfig)
[2019-08-28 10:09:31,562] INFO [ThrottledChannelReaper-Produce]: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2019-08-28 10:09:31,568] INFO [ThrottledChannelReaper-Request]: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2019-08-28 10:09:31,568] INFO [ThrottledChannelReaper-Fetch]: Starting (kafka.server.ClientQuotaManager$ThrottledChannelReaper)
[2019-08-28 10:09:32,054] INFO Loading logs. (kafka.log.LogManager)
[2019-08-28 10:09:32,253] INFO Logs loading complete in 199 ms. (kafka.log.LogManager)
[2019-08-28 10:09:32,562] INFO Starting log cleanup with a period of 300000 ms. (kafka.log.LogManager)
[2019-08-28 10:09:32,749] INFO Starting log flusher with a default period of 9223372036854775807 ms. (kafka.log.LogManager)
baluchicken commented 5 years ago

@xiongmaodada I will try to reproduce your error, can you please also share the operators log. Thanks

xiongmaodada commented 5 years ago

Thank you for your replys quickly.

kubectl logs kafka-operator-operator-0 -c manager -n kafka -f

{"level":"info","ts":1566990081.2534606,"logger":"controller","msg":"Reconciling KafkaCluster","Request.Namespace":"kafka","Request.Name":"kafka"}
{"level":"info","ts":1566990081.4579751,"logger":"controller","msg":"Reconciling KafkaCluster","Request.Namespace":"kafka","Request.Name":"kafka"}
{"level":"info","ts":1566990081.4906683,"logger":"controller","msg":"resource created","Request.Namespace":"kafka","Request.Name":"kafka","component":"kafka","kind":"*v1.Pod"}
{"level":"info","ts":1566990081.5584164,"logger":"controller","msg":"Reconciling KafkaCluster","Request.Namespace":"kafka","Request.Name":"kafka"}
{"level":"info","ts":1566990083.35322,"logger":"controller","msg":"Reconciling KafkaCluster","Request.Namespace":"kafka","Request.Name":"kafka"}
{"level":"info","ts":1566990098.9545534,"logger":"controller","msg":"Reconciling KafkaCluster","Request.Namespace":"kafka","Request.Name":"kafka"}
{"level":"info","ts":1566990099.957768,"logger":"controller","msg":"Reconciling KafkaCluster","Request.Namespace":"kafka","Request.Name":"kafka"}
{"level":"info","ts":1566990100.9538174,"logger":"controller","msg":"Reconciling KafkaCluster","Request.Namespace":"kafka","Request.Name":"kafka"}
{"level":"info","ts":1566990102.053241,"logger":"controller","msg":"Reconciling KafkaCluster","Request.Namespace":"kafka","Request.Name":"kafka"}
{"level":"info","ts":1566990121.5538023,"logger":"controller","msg":"Reconciling KafkaCluster","Request.Namespace":"kafka","Request.Name":"kafka"}
{"level":"info","ts":1566990121.753047,"logger":"controller","msg":"Reconciling KafkaCluster","Request.Namespace":"kafka","Request.Name":"kafka"}
{"level":"info","ts":1566990121.7804203,"logger":"controller","msg":"resource created","Request.Namespace":"kafka","Request.Name":"kafka","component":"kafka","kind":"*v1.Pod"}
{"level":"info","ts":1566990121.9516146,"logger":"controller","msg":"Reconciling KafkaCluster","Request.Namespace":"kafka","Request.Name":"kafka"}
{"level":"info","ts":1566990123.6931918,"logger":"controller","msg":"Reconciling KafkaCluster","Request.Namespace":"kafka","Request.Name":"kafka"}
{"level":"info","ts":1566990137.050859,"logger":"controller","msg":"Reconciling KafkaCluster","Request.Namespace":"kafka","Request.Name":"kafka"}
{"level":"info","ts":1566990138.0411665,"logger":"controller","msg":"Reconciling KafkaCluster","Request.Namespace":"kafka","Request.Name":"kafka"}
{"level":"info","ts":1566990139.1509326,"logger":"controller","msg":"Reconciling KafkaCluster","Request.Namespace":"kafka","Request.Name":"kafka"}
{"level":"info","ts":1566990140.1538386,"logger":"controller","msg":"Reconciling KafkaCluster","Request.Namespace":"kafka","Request.Name":"kafka"}
baluchicken commented 5 years ago

@xiongmaodada I managed to reproduce your error. The first broker get's OOMKilled.

The used example configuration kubectl create -n kafka -f config/samples/banzaicloud_v1alpha1_kafkacluster.yaml configures the first broker's container to use only 300M memory which is simply not enough. I will create a PR which comments the referenced lines out. (It is placed there to show all the available configurations.) Please remove the referenced block from the CR.

On the other hand I would also suggest to increase your Minikube Memory and CPU size to at least 4 CPUs and 6GB RAM.

xiongmaodada commented 5 years ago

@baluchicken thank you, I try it.

Is there a document that test the above Kafka cluster? such as Send and receive messages Part. I don't know how to connect to kafka cluster from external of minikube k8s cluster.

baluchicken commented 5 years ago

We have something called Spotguides. You can read more about the concept here. We have a Kafka Spotguide which uses this Operator.

Spotguide contains documentation which considers your configuration.

I just copied the relevant part for you:

kubectl create -n kafka -f - <<EOF
apiVersion: v1
kind: Pod
metadata:
  name: kafka-test
spec:
  containers:
  - name: kafka-test
    image: solsson/kafkacat
    # Just spin & wait forever
    command: [ "/bin/bash", "-c", "--" ]
    args: [ "while true; do sleep 3000; done;" ]
    volumeMounts:
    - name: sslcerts
      mountPath: "/ssl/certs"
  volumes:
  - name: sslcerts
    secret:
      secretName: test-kafka-operator
EOF

Then exec into the container and produce and consume some messages:

kubectl exec -it -n kafka kafka-test bash

Produce some message

kafkacat -P -b kafka-headless:29092 -t kafka-test \
-X security.protocol=SSL \ 
-X ssl.key.location=/ssl/certs/clientKey \ 
-X ssl.certificate.location=/ssl/certs/clientCert \ 
-X ssl.ca.location=/ssl/certs/caCert

Consume them

kafkacat -C -b kafka-headless:29092 -t kafka-test \ 
-X security.protocol=SSL \
-X ssl.key.location=/ssl/certs/clientKey \ 
-X ssl.certificate.location=/ssl/certs/clientCert \ 
-X ssl.ca.location=/ssl/certs/caCert
xiongmaodada commented 5 years ago

@baluchicken i got it.

how to create topic at external of minikube k8s cluster? for example, there is a bin/kafka-topics.sh at the other machine that is at external of minikube k8s cluster, how to create topic by bin/kafka-topics.sh command?

/bin/kafka-topics.sh --create --zookeeper ip:port --replication-factor 1 --partitions 1 --topic my-kafka-topic

what is zookeeper ip:prt ?

producer:

/bin/kafka-console-producer.sh --broker-list nodeip:port --topic my-kafka-topic

what is zookeeper nodeip:port ?

baluchicken commented 5 years ago

Unfortunately, ZK is not accessable from outside. It is provisioned by a third party operator which as far as I know does not support this feature yet.

You can use the following command to create topics from inside the cluster.

kubectl -n kafka run kafka-topics -it --image=wurstmeister/kafka:2.12-2.1.0 --rm=true --restart=Never -- /opt/kafka/bin/kafka-topics.sh --zookeeper example-zookeepercluster-client.zookeeper:2181 --topic my-topic --create --partitions 1 --replication-factor 1
xiongmaodada commented 5 years ago

@baluchicken Thanks a lot, It's ok!

I have another question, how to use /bin/kafka-console-producer.sh or /bin/kafka-console-consumer.sh command to send and receive messages?

baluchicken commented 5 years ago

@xiongmaodada I just created a simple docs about how to produce/consume messages on a freshly deployed Kafka Cluster. Regarding the java producer/consumer because your cluster is using SSL I recommend to follow the official documentation on keystone/trustrore creation.

xiongmaodada commented 5 years ago

@baluchicken It's just what I need, Thank you very much!

bechhansen commented 3 years ago

Hi @xiongmaodada

Did you ever find the solution for this issue?

When enabling Prometheus annotations for the Kafka nodes (using the operator) one of my nodes become unstable and is terminated after ~30-40 seconds. I don't think this issue is a resource issue as I use the default resource requirements/limits that looks okay.

Its only one node that becomes unstable, and it looks like the operator is gracefully terminating the node.

My issue: https://github.com/banzaicloud/koperator/issues/659