Yolean / kubernetes-kafka

Kafka cluster as Kubernetes StatefulSet, plain manifests and config
Apache License 2.0
1.83k stars 738 forks source link

Readiness probe failed for kafka #279

Open selkabli opened 5 years ago

selkabli commented 5 years ago

Hi, this is my first time using kafka so maybe i'm messing somthing can you please help

NAME          READY   STATUS             RESTARTS   AGE
pod/kafka-0   1/1     Running            0          50m
pod/kafka-1   1/1     Running            0          50m
pod/kafka-2   0/1     CrashLoopBackOff   6          12m
pod/pzoo-0    1/1     Running            0          57m
pod/pzoo-1    1/1     Running            0          57m
pod/pzoo-2    1/1     Running            0          57m
pod/zoo-0     1/1     Running            0          56m
pod/zoo-1     1/1     Running            0          56m

NAME                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
service/bootstrap   ClusterIP   10.233.9.140    <none>        9092/TCP            51m
service/broker      ClusterIP   None            <none>        9092/TCP            52m
service/pzoo        ClusterIP   None            <none>        2888/TCP,3888/TCP   59m
service/zoo         ClusterIP   None            <none>        2888/TCP,3888/TCP   58m
service/zookeeper   ClusterIP   10.233.35.111   <none>        2181/TCP            58m

NAME                     READY   AGE
statefulset.apps/kafka   2/3     50m
statefulset.apps/pzoo    3/3     57m
statefulset.apps/zoo     2/2     56m ```
```[root@node1 ~]#  kubectl get events -n kafka | grep Warn |grep pod/kafka-2
45m         Warning   Unhealthy               pod/kafka-2                          Readiness probe failed: dial tcp 10.233.90.28:9092: connect: connection refused
41m         Warning   BackOff                 pod/kafka-2                          Back-off restarting failed container
32m         Warning   Unhealthy               pod/kafka-2                          Readiness probe failed: dial tcp 10.233.90.29:9092: connect: connection refused
17m         Warning   BackOff                 pod/kafka-2                          Back-off restarting failed container
7m50s       Warning   Unhealthy               pod/kafka-2                          Readiness probe failed: dial tcp 10.233.90.30:9092: connect: connection refused
2m50s       Warning   BackOff                 pod/kafka-2                          Back-off restarting failed container
[root@node1 ~]# kubectl logs kafka-2 -n kafka
[2019-06-21 23:51:06,385] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$)
[2019-06-21 23:51:07,197] INFO starting (kafka.server.KafkaServer)
[2019-06-21 23:51:07,198] INFO Connecting to zookeeper on zookeeper:2181 (kafka.server.KafkaServer)
[2019-06-21 23:51:07,226] INFO [ZooKeeperClient] Initializing a new session to zookeeper:2181. (kafka.zookeeper.ZooKeeperClient)
[2019-06-21 23:51:07,232] INFO Client environment:zookeeper.version=3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on 06/29/2018 00:39 GMT (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,232] INFO Client environment:host.name=kafka-2.broker.kafka.svc.cluster.local (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,232] INFO Client environment:java.version=11.0.2 (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,232] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,232] INFO Client environment:java.home=/usr/lib/jvm/jdk-11 (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,232] INFO Client environment:java.class.path=/opt/kafka/libs/extensions/*:/opt/kafka/bin/../libs/activation-1.1.1.jar:/opt/kafka/bin/../libs/aopalliance-repackaged-2.5.0-b42.jar:/opt/kafka/bin/../libs/argparse4j-0.7.0.jar:/opt/kafka/bin/../libs/audience-annotations-0.5.0.jar:/opt/kafka/bin/../libs/commons-lang3-3.8.1.jar:/opt/kafka/bin/../libs/connect-api-2.2.1.jar:/opt/kafka/bin/../libs/connect-basic-auth-extension-2.2.1.jar:/opt/kafka/bin/../libs/connect-file-2.2.1.jar:/opt/kafka/bin/../libs/connect-json-2.2.1.jar:/opt/kafka/bin/../libs/connect-runtime-2.2.1.jar:/opt/kafka/bin/../libs/connect-transforms-2.2.1.jar:/opt/kafka/bin/../libs/extensions:/opt/kafka/bin/../libs/guava-20.0.jar:/opt/kafka/bin/../libs/hk2-api-2.5.0-b42.jar:/opt/kafka/bin/../libs/hk2-locator-2.5.0-b42.jar:/opt/kafka/bin/../libs/hk2-utils-2.5.0-b42.jar:/opt/kafka/bin/../libs/jackson-annotations-2.9.8.jar:/opt/kafka/bin/../libs/jackson-core-2.9.8.jar:/opt/kafka/bin/../libs/jackson-databind-2.9.8.jar:/opt/kafka/bin/../libs/jackson-datatype-jdk8-2.9.8.jar:/opt/kafka/bin/../libs/jackson-jaxrs-base-2.9.8.jar:/opt/kafka/bin/../libs/jackson-jaxrs-json-provider-2.9.8.jar:/opt/kafka/bin/../libs/jackson-module-jaxb-annotations-2.9.8.jar:/opt/kafka/bin/../libs/javassist-3.22.0-CR2.jar:/opt/kafka/bin/../libs/javax.annotation-api-1.2.jar:/opt/kafka/bin/../libs/javax.inject-1.jar:/opt/kafka/bin/../libs/javax.inject-2.5.0-b42.jar:/opt/kafka/bin/../libs/javax.servlet-api-3.1.0.jar:/opt/kafka/bin/../libs/javax.ws.rs-api-2.1.1.jar:/opt/kafka/bin/../libs/javax.ws.rs-api-2.1.jar:/opt/kafka/bin/../libs/jaxb-api-2.3.0.jar:/opt/kafka/bin/../libs/jersey-client-2.27.jar:/opt/kafka/bin/../libs/jersey-common-2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet-2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet-core-2.27.jar:/opt/kafka/bin/../libs/jersey-hk2-2.27.jar:/opt/kafka/bin/../libs/jersey-media-jaxb-2.27.jar:/opt/kafka/bin/../libs/jersey-server-2.27.jar:/opt/kafka/bin/../libs/jetty-client-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-continuation-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-http-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-io-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-security-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-server-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-servlet-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-servlets-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-util-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jopt-simple-5.0.4.jar:/opt/kafka/bin/../libs/kafka-clients-2.2.1.jar:/opt/kafka/bin/../libs/kafka-log4j-appender-2.2.1.jar:/opt/kafka/bin/../libs/kafka-streams-2.2.1.jar:/opt/kafka/bin/../libs/kafka-streams-examples-2.2.1.jar:/opt/kafka/bin/../libs/kafka-streams-scala_2.12-2.2.1.jar:/opt/kafka/bin/../libs/kafka-streams-test-utils-2.2.1.jar:/opt/kafka/bin/../libs/kafka-tools-2.2.1.jar:/opt/kafka/bin/../libs/kafka_2.12-2.2.1-sources.jar:/opt/kafka/bin/../libs/kafka_2.12-2.2.1.jar:/opt/kafka/bin/../libs/log4j-1.2.17.jar:/opt/kafka/bin/../libs/lz4-java-1.5.0.jar:/opt/kafka/bin/../libs/maven-artifact-3.6.0.jar:/opt/kafka/bin/../libs/metrics-core-2.2.0.jar:/opt/kafka/bin/../libs/osgi-resource-locator-1.0.1.jar:/opt/kafka/bin/../libs/plexus-utils-3.1.0.jar:/opt/kafka/bin/../libs/reflections-0.9.11.jar:/opt/kafka/bin/../libs/rocksdbjni-5.15.10.jar:/opt/kafka/bin/../libs/scala-library-2.12.8.jar:/opt/kafka/bin/../libs/scala-logging_2.12-3.9.0.jar:/opt/kafka/bin/../libs/scala-reflect-2.12.8.jar:/opt/kafka/bin/../libs/slf4j-api-1.7.25.jar:/opt/kafka/bin/../libs/slf4j-log4j12-1.7.25.jar:/opt/kafka/bin/../libs/snappy-java-1.1.7.2.jar:/opt/kafka/bin/../libs/validation-api-1.1.0.Final.jar:/opt/kafka/bin/../libs/zkclient-0.11.jar:/opt/kafka/bin/../libs/zookeeper-3.4.13.jar:/opt/kafka/bin/../libs/zstd-jni-1.3.8-1.jar (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,232] INFO Client environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,232] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,232] INFO Client environment:java.compiler=<NA> (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,233] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,233] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,233] INFO Client environment:os.version=3.10.0-957.12.1.el7.x86_64 (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,233] INFO Client environment:user.name=root (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,233] INFO Client environment:user.home=/root (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,233] INFO Client environment:user.dir=/opt/kafka (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,234] INFO Initiating client connection, connectString=zookeeper:2181 sessionTimeout=6000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@561868a0 (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:07,251] INFO [ZooKeeperClient] Waiting until connected. (kafka.zookeeper.ZooKeeperClient)
[2019-06-21 23:51:13,254] INFO [ZooKeeperClient] Closing. (kafka.zookeeper.ZooKeeperClient)
[2019-06-21 23:51:27,265] INFO Opening socket connection to server zookeeper:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn)
[2019-06-21 23:51:27,377] INFO Session: 0x0 closed (org.apache.zookeeper.ZooKeeper)
[2019-06-21 23:51:27,380] INFO EventThread shut down for session: 0x0 (org.apache.zookeeper.ClientCnxn)
[2019-06-21 23:51:27,382] INFO [ZooKeeperClient] Closed. (kafka.zookeeper.ZooKeeperClient)
[2019-06-21 23:51:27,387] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer)
kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING
        at kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:242)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
        at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251)
        at kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:238)
        at kafka.zookeeper.ZooKeeperClient.<init>(ZooKeeperClient.scala:96)
        at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1825)
        at kafka.server.KafkaServer.createZkClient$1(KafkaServer.scala:361)
        at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:385)
        at kafka.server.KafkaServer.startup(KafkaServer.scala:205)
        at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38)
        at kafka.Kafka$.main(Kafka.scala:75)
        at kafka.Kafka.main(Kafka.scala)
[2019-06-21 23:51:27,390] INFO shutting down (kafka.server.KafkaServer)
[2019-06-21 23:51:27,403] INFO shut down completed (kafka.server.KafkaServer)
[2019-06-21 23:51:27,404] ERROR Exiting Kafka. (kafka.server.KafkaServerStartable)
[2019-06-21 23:51:27,407] INFO shutting down (kafka.server.KafkaServer)
solsson commented 5 years ago

Looks like two kafka pods succeed and one fails. It could be https://github.com/Yolean/kubernetes-kafka/commit/463e1c75424c5daf993710c1858df9782c0ed77c though that would be strange because there are 5 zookeeper pods to reach for 3 kafka brokers. Does everything but kafka-2 stay ready or is there other events? Do zookeeper services have the expected endpoints?

Please use ``` when you post command ouput. Makes it a lot more readable. See https://guides.github.com/features/mastering-markdown/

selkabli commented 5 years ago

i changed zookeeper config to maxClientCnxns=2 but te same issue still persiste

[root@node1 ~]# kubectl get svc -n kafka
NAME        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
bootstrap   ClusterIP   10.233.9.140    <none>        9092/TCP            13h
broker      ClusterIP   None            <none>        9092/TCP            13h
pzoo        ClusterIP   None            <none>        2888/TCP,3888/TCP   13h
zoo         ClusterIP   None            <none>        2888/TCP,3888/TCP   13h
zookeeper   ClusterIP   10.233.35.111   <none>        2181/TCP            13h
[root@node1 ~]# kubectl describe svc zookeeper -n kafka
Name:              zookeeper
Namespace:         kafka
Labels:            <none>
Annotations:       kubectl.kubernetes.io/last-applied-configuration:
                     {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"zookeeper","namespace":"kafka"},"spec":{"ports":[{"name":"client"...
Selector:          app=zookeeper
Type:              ClusterIP
IP:                10.233.35.111
Port:              client  2181/TCP
TargetPort:        2181/TCP
Endpoints:         10.233.90.24:2181,10.233.90.26:2181,10.233.92.33:2181 + 2 more...
Session Affinity:  None
Events:            <none>
[root@node1 ~]# kubectl get pods -n kafka -o wide
NAME      READY   STATUS             RESTARTS   AGE   IP             NODE    NOMINATED NODE   READINESS GATES
kafka-0   1/1     Running            0          13h   10.233.92.35   node3   <none>           <none>
kafka-1   1/1     Running            0          13h   10.233.96.34   node2   <none>           <none>
kafka-2   0/1     CrashLoopBackOff   14         52m   10.233.90.31   node1   <none>           <none>
pzoo-0    1/1     Running            0          13h   10.233.96.30   node2   <none>           <none>
pzoo-1    1/1     Running            1          13h   10.233.92.33   node3   <none>           <none>
pzoo-2    1/1     Running            1          13h   10.233.90.24   node1   <none>           <none>
zoo-0     1/1     Running            0          13h   10.233.96.32   node2   <none>           <none>
zoo-1     1/1     Running            1          13h   10.233.90.26   node1   <none>           <none>
solsson commented 5 years ago

I'm puzzled. At this point I can't come up with a single hypothesis to test. Something might come to mind later, but my only advice now is to dig around and do different experiments that involve killing pods.

Edit: zookeeper logs could possibly provide clues.

amateu commented 5 years ago

@solsson I also reported the same error.When I modify kafka and zk namespace Other namespace 。initing kafka init-config reported error:

[2019-06-26 05:52:11,975] INFO Registered kafka:type=kafka.Log4jController MBean (kafka.utils.Log4jControllerRegistration$) [2019-06-26 05:52:12,472] INFO starting (kafka.server.KafkaServer) [2019-06-26 05:52:12,472] INFO Connecting to zookeeper on zk-cluster-0.zk-cli.zhihuiaj.svc.cluster.local:2181,zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local:2181,zk-cluster-2.zk-cli.zhihuiaj.svc.cluster.local:2181 (kafka.server.KafkaServer) [2019-06-26 05:52:12,492] INFO [ZooKeeperClient] Initializing a new session to zk-cluster-0.zk-cli.zhihuiaj.svc.cluster.local:2181,zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local:2181,zk-cluster-2.zk-cli.zhihuiaj.svc.cluster.local:2181. (kafka.zookeeper.ZooKeeperClient) [2019-06-26 05:52:12,497] INFO Client environment:zookeeper.version=3.4.13-2d71af4dbe22557fda74f9a9b4309b15a7487f03, built on 06/29/2018 00:39 GMT (org.apache.zookeeper.ZooKeeper) [2019-06-26 05:52:12,497] INFO Client environment:host.name=kafka-0.kafka-cluster.kafka.svc.cluster.local (org.apache.zookeeper.ZooKeeper) [2019-06-26 05:52:12,497] INFO Client environment:java.version=11.0.2 (org.apache.zookeeper.ZooKeeper) [2019-06-26 05:52:12,497] INFO Client environment:java.vendor=Oracle Corporation (org.apache.zookeeper.ZooKeeper) [2019-06-26 05:52:12,497] INFO Client environment:java.home=/usr/lib/jvm/jdk-11 (org.apache.zookeeper.ZooKeeper) [2019-06-26 05:52:12,497] INFO Client environment:java.class.path=/opt/kafka/libs/extensions/*:/opt/kafka/bin/../libs/activation-1.1.1.jar:/opt/kafka/bin/../libs/aopalliance-repackaged-2.5.0-b42.jar:/opt/kafka/bin/../libs/argparse4j-0.7.0.jar:/opt/kafka/bin/../libs/audience-annotations-0.5.0.jar:/opt/kafka/bin/../libs/commons-lang3-3.8.1.jar:/opt/kafka/bin/../libs/connect-api-2.2.1.jar:/opt/kafka/bin/../libs/connect-basic-auth-extension-2.2.1.jar:/opt/kafka/bin/../libs/connect-file-2.2.1.jar:/opt/kafka/bin/../libs/connect-json-2.2.1.jar:/opt/kafka/bin/../libs/connect-runtime-2.2.1.jar:/opt/kafka/bin/../libs/connect-transforms-2.2.1.jar:/opt/kafka/bin/../libs/extensions:/opt/kafka/bin/../libs/guava-20.0.jar:/opt/kafka/bin/../libs/hk2-api-2.5.0-b42.jar:/opt/kafka/bin/../libs/hk2-locator-2.5.0-b42.jar:/opt/kafka/bin/../libs/hk2-utils-2.5.0-b42.jar:/opt/kafka/bin/../libs/jackson-annotations-2.9.8.jar:/opt/kafka/bin/../libs/jackson-core-2.9.8.jar:/opt/kafka/bin/../libs/jackson-databind-2.9.8.jar:/opt/kafka/bin/../libs/jackson-datatype-jdk8-2.9.8.jar:/opt/kafka/bin/../libs/jackson-jaxrs-base-2.9.8.jar:/opt/kafka/bin/../libs/jackson-jaxrs-json-provider-2.9.8.jar:/opt/kafka/bin/../libs/jackson-module-jaxb-annotations-2.9.8.jar:/opt/kafka/bin/../libs/javassist-3.22.0-CR2.jar:/opt/kafka/bin/../libs/javax.annotation-api-1.2.jar:/opt/kafka/bin/../libs/javax.inject-1.jar:/opt/kafka/bin/../libs/javax.inject-2.5.0-b42.jar:/opt/kafka/bin/../libs/javax.servlet-api-3.1.0.jar:/opt/kafka/bin/../libs/javax.ws.rs-api-2.1.1.jar:/opt/kafka/bin/../libs/javax.ws.rs-api-2.1.jar:/opt/kafka/bin/../libs/jaxb-api-2.3.0.jar:/opt/kafka/bin/../libs/jersey-client-2.27.jar:/opt/kafka/bin/../libs/jersey-common-2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet-2.27.jar:/opt/kafka/bin/../libs/jersey-container-servlet-core-2.27.jar:/opt/kafka/bin/../libs/jersey-hk2-2.27.jar:/opt/kafka/bin/../libs/jersey-media-jaxb-2.27.jar:/opt/kafka/bin/../libs/jersey-server-2.27.jar:/opt/kafka/bin/../libs/jetty-client-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-continuation-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-http-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-io-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-security-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-server-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-servlet-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-servlets-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jetty-util-9.4.14.v20181114.jar:/opt/kafka/bin/../libs/jopt-simple-5.0.4.jar:/opt/kafka/bin/../libs/kafka-clients-2.2.1.jar:/opt/kafka/bin/../libs/kafka-log4j-appender-2.2.1.jar:/opt/kafka/bin/../libs/kafka-streams-2.2.1.jar:/opt/kafka/bin/../libs/kafka-streams-examples-2.2.1.jar:/opt/kafka/bin/../libs/kafka-streams-scala_2.12-2.2.1.jar:/opt/kafka/bin/../libs/kafka-streams-test-utils-2.2.1.jar:/opt/kafka/bin/../libs/kafka-tools-2.2.1.jar:/opt/kafka/bin/../libs/kafka_2.12-2.2.1-sources.jar:/opt/kafka/bin/../libs/kafka_2.12-2.2.1.jar:/opt/kafka/bin/../libs/log4j-1.2.17.jar:/opt/kafka/bin/../libs/lz4-java-1.5.0.jar:/opt/kafka/bin/../libs/maven-artifact-3.6.0.jar:/opt/kafka/bin/../libs/metrics-core-2.2.0.jar:/opt/kafka/bin/../libs/osgi-resource-locator-1.0.1.jar:/opt/kafka/bin/../libs/plexus-utils-3.1.0.jar:/opt/kafka/bin/../libs/reflections-0.9.11.jar:/opt/kafka/bin/../libs/rocksdbjni-5.15.10.jar:/opt/kafka/bin/../libs/scala-library-2.12.8.jar:/opt/kafka/bin/../libs/scala-logging_2.12-3.9.0.jar:/opt/kafka/bin/../libs/scala-reflect-2.12.8.jar:/opt/kafka/bin/../libs/slf4j-api-1.7.25.jar:/opt/kafka/bin/../libs/slf4j-log4j12-1.7.25.jar:/opt/kafka/bin/../libs/snappy-java-1.1.7.2.jar:/opt/kafka/bin/../libs/validation-api-1.1.0.Final.jar:/opt/kafka/bin/../libs/zkclient-0.11.jar:/opt/kafka/bin/../libs/zookeeper-3.4.13.jar:/opt/kafka/bin/../libs/zstd-jni-1.3.8-1.jar (org.apache.zookeeper.ZooKeeper) [2019-06-26 05:52:12,497] INFO Client environment:java.library.path=/usr/java/packages/lib:/usr/lib64:/lib64:/lib:/usr/lib (org.apache.zookeeper.ZooKeeper) [2019-06-26 05:52:12,497] INFO Client environment:java.io.tmpdir=/tmp (org.apache.zookeeper.ZooKeeper) [2019-06-26 05:52:12,497] INFO Client environment:java.compiler= (org.apache.zookeeper.ZooKeeper) [2019-06-26 05:52:12,497] INFO Client environment:os.name=Linux (org.apache.zookeeper.ZooKeeper) [2019-06-26 05:52:12,497] INFO Client environment:os.arch=amd64 (org.apache.zookeeper.ZooKeeper) [2019-06-26 05:52:12,497] INFO Client environment:os.version=5.1.9-050109-generic (org.apache.zookeeper.ZooKeeper) [2019-06-26 05:52:12,497] INFO Client environment:user.name=root (org.apache.zookeeper.ZooKeeper) [2019-06-26 05:52:12,497] INFO Client environment:user.home=/root (org.apache.zookeeper.ZooKeeper) [2019-06-26 05:52:12,497] INFO Client environment:user.dir=/opt/kafka (org.apache.zookeeper.ZooKeeper) [2019-06-26 05:52:12,498] INFO Initiating client connection, connectString=zk-cluster-0.zk-cli.zhihuiaj.svc.cluster.local:2181,zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local:2181,zk-cluster-2.zk-cli.zhihuiaj.svc.cluster.local:2181 sessionTimeout=6000 watcher=kafka.zookeeper.ZooKeeperClient$ZooKeeperClientWatcher$@6138e79a (org.apache.zookeeper.ZooKeeper) [2019-06-26 05:52:12,509] INFO [ZooKeeperClient] Waiting until connected. (kafka.zookeeper.ZooKeeperClient) [2019-06-26 05:52:12,517] INFO Opening socket connection to server zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.80:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) [2019-06-26 05:52:12,524] INFO Socket connection established to zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.80:2181, initiating session (org.apache.zookeeper.ClientCnxn) [2019-06-26 05:52:12,527] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2019-06-26 05:52:13,251] INFO Opening socket connection to server zk-cluster-0.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.79:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) [2019-06-26 05:52:13,252] INFO Socket connection established to zk-cluster-0.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.79:2181, initiating session (org.apache.zookeeper.ClientCnxn) [2019-06-26 05:52:13,252] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2019-06-26 05:52:14,058] INFO Opening socket connection to server zk-cluster-2.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.81:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) [2019-06-26 05:52:14,059] INFO Socket connection established to zk-cluster-2.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.81:2181, initiating session (org.apache.zookeeper.ClientCnxn) [2019-06-26 05:52:14,059] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2019-06-26 05:52:15,985] INFO Opening socket connection to server zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.80:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) [2019-06-26 05:52:15,985] INFO Socket connection established to zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.80:2181, initiating session (org.apache.zookeeper.ClientCnxn) [2019-06-26 05:52:15,986] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2019-06-26 05:52:16,766] INFO Opening socket connection to server zk-cluster-0.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.79:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) [2019-06-26 05:52:16,767] INFO Socket connection established to zk-cluster-0.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.79:2181, initiating session (org.apache.zookeeper.ClientCnxn) [2019-06-26 05:52:16,768] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2019-06-26 05:52:17,208] INFO Opening socket connection to server zk-cluster-2.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.81:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) [2019-06-26 05:52:17,209] INFO Socket connection established to zk-cluster-2.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.81:2181, initiating session (org.apache.zookeeper.ClientCnxn) [2019-06-26 05:52:17,210] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2019-06-26 05:52:18,384] INFO Opening socket connection to server zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.80:2181. Will not attempt to authenticate using SASL (unknown error) (org.apache.zookeeper.ClientCnxn) [2019-06-26 05:52:18,384] INFO Socket connection established to zk-cluster-1.zk-cli.zhihuiaj.svc.cluster.local/10.244.1.80:2181, initiating session (org.apache.zookeeper.ClientCnxn) [2019-06-26 05:52:18,385] INFO Unable to read additional data from server sessionid 0x0, likely server has closed socket, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) [2019-06-26 05:52:18,512] INFO [ZooKeeperClient] Closing. (kafka.zookeeper.ZooKeeperClient) [2019-06-26 05:52:19,265] INFO Session: 0x0 closed (org.apache.zookeeper.ZooKeeper) [2019-06-26 05:52:19,269] INFO EventThread shut down for session: 0x0 (org.apache.zookeeper.ClientCnxn) [2019-06-26 05:52:19,270] INFO [ZooKeeperClient] Closed. (kafka.zookeeper.ZooKeeperClient) [2019-06-26 05:52:19,278] ERROR Fatal error during KafkaServer startup. Prepare to shutdown (kafka.server.KafkaServer) kafka.zookeeper.ZooKeeperClientTimeoutException: Timed out waiting for connection while in state: CONNECTING at kafka.zookeeper.ZooKeeperClient.$anonfun$waitUntilConnected$3(ZooKeeperClient.scala:242) at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251) at kafka.zookeeper.ZooKeeperClient.waitUntilConnected(ZooKeeperClient.scala:238) at kafka.zookeeper.ZooKeeperClient.(ZooKeeperClient.scala:96) at kafka.zk.KafkaZkClient$.apply(KafkaZkClient.scala:1825) at kafka.server.KafkaServer.createZkClient$1(KafkaServer.scala:361) at kafka.server.KafkaServer.initZkClient(KafkaServer.scala:385) at kafka.server.KafkaServer.startup(KafkaServer.scala:205) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38) at kafka.Kafka$.main(Kafka.scala:75) at kafka.Kafka.main(Kafka.scala) [2019-06-26 05:52:19,281] INFO shutting down (kafka.server.KafkaServer) [2019-06-26 05:52:19,291] INFO shut down completed (kafka.server.KafkaServer) [2019-06-26 05:52:19,291] ERROR Exiting Kafka. (kafka.server.KafkaServerStartable) [2019-06-26 05:52:19,294] INFO shutting down (kafka.server.KafkaServer)

solsson commented 5 years ago

@amateu It looks like yours is a custom setup with ExternalName for zookeeper. Why don't you edit zookeeper.connect in Kafka's config instead? In addition you seem to have quite specific RBAC in your cluster and you probably need to customize the RBAC resources.

With @selkabli's issue what is most interesting is that only kafka-2 fails. I think in your setup @amateu all brokers will fail.

amateu commented 5 years ago

@solsson ,yes,it's all brokers will fail.The reason is really caused by rbac, I tried to create a rbac on my project to deploy zk and kafka instead of namespace is kafka. But still the connection zk timeout。 So, I deployed zk and kafka in another clean test environment, not using rbac. But still the connection zk timeout. The same mistake as before. Finally, I changed the yml of zk. Zk and kafka clusters are normal。 I still can't find the specific reason for the previous problem. With @selkabli's issue,I think he might have used hostNetwork: true

selkabli commented 5 years ago

@solsson the problem happen only on node1 whish is the master of my cluster any clues why ?

the taint is already removed from master so it's not related to taint

solsson commented 5 years ago

That's an important observation. I haven't tried running on a mastter. I have no clue why the zookeeper connection would fail from there.

weiwongfaye commented 4 years ago

having the same issue as @selkabli, I am deploying on bear-metal k8s cluster with local persistent volume. 1 broker (out of 3) always failed to start correctly.

weiwongfaye commented 4 years ago

nvm, seems the pv on one of the node having problem which cause this. I changed the pv to another node, it works fine.