confluentinc / cp-helm-charts

The Confluent Platform Helm charts enable you to deploy Confluent Platform services on Kubernetes for development, test, and proof of concept environments.
https://cnfl.io/getting-started-kafka-kubernetes
Apache License 2.0
789 stars 846 forks source link

Kafka Fails with pod has unbound immediate PersistentVolumeClaims #225

Closed mekaam closed 5 years ago

mekaam commented 5 years ago

Hello, I followed the steps https://docs.confluent.io/current/installation/installing_cp/cp-helm-charts/docs/index.html to install Kafka with the helm charts on 1.13 3 nodes kubernetes cluster After the installation i see this: [root@master ~]# kubectl get pods NAME READY STATUS RESTART S AGE my-confluent-oss-canary 0/1 Error 0 2d17h my-confluent-oss-cp-kafka-0 0/2 Pending 0 2d17h my-confluent-oss-cp-kafka-connect-d478b7b8d-6dwbz 1/2 Error 22 2d17h my-confluent-oss-cp-kafka-rest-6ff5c6dd94-wmrdh 2/2 Running 20 2d17h my-confluent-oss-cp-ksql-server-7d754b894d-xjdmr 2/2 Running 23 2d17h my-confluent-oss-cp-schema-registry-7ddd6d7c7c-mzgq6 2/2 Running 23 2d17h my-confluent-oss-cp-zookeeper-0 0/2 Pending 0

when i start to look into he logs of my-confluent-oss-canary [2019-01-10 01:53:45,978] WARN Session 0x0 for server my-confluent-oss-cp-zookeeper-headless:2181, unexpected err or, closing socket connection and attempting reconnect (org.apache.zookeeper.ClientCnxn) java.nio.channels.UnresolvedAddressException at sun.nio.ch.Net.checkAddress(Net.java:101) at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:622) at org.apache.zookeeper.ClientCnxnSocketNIO.registerAndConnect(ClientCnxnSocketNIO.java:277) at org.apache.zookeeper.ClientCnxnSocketNIO.connect(ClientCnxnSocketNIO.java:287) at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1021) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1064)

So i tried to move to the zookeeper issue first.

[root@master data]# kubectl describe pod my-confluent-oss-cp-zookeeper-0 Name: my-confluent-oss-cp-zookeeper-0 Namespace: default Priority: 0 PriorityClassName: Node: Labels: app=cp-zookeeper controller-revision-hash=my-confluent-oss-cp-zookeeper-56c85d498c release=my-confluent-oss statefulset.kubernetes.io/pod-name=my-confluent-oss-cp-zookeeper-0 Annotations: prometheus.io/port: 5556 prometheus.io/scrape: true Status: Pending IP: Controlled By: StatefulSet/my-confluent-oss-cp-zookeeper Containers: prometheus-jmx-exporter: Image: solsson/kafka-prometheus-jmx-exporter@sha256:6f82e2b0464f50da8104acd7363fb9b995001ddff77d248379f8788e78946143 Port: 5556/TCP Host Port: 0/TCP Command: java -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=1 -XshowSettings:vm -jar jmx_prometheus_httpserver.jar 5556 /etc/jmx-zookeeper/jmx-zookeeper-prometheus.yml Environment: Mounts: /etc/jmx-zookeeper from jmx-config (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-6zwz4 (ro) cp-zookeeper-server: Image: confluentinc/cp-zookeeper:5.0.1 Ports: 2181/TCP, 2888/TCP, 3888/TCP, 5555/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP, 0/TCP Command: bash -c ZOOKEEPER_SERVER_ID=$((${HOSTNAME##*-}+1)) && /etc/confluent/docker/run Environment: KAFKA_HEAP_OPTS: -Xms512M -Xmx512M KAFKA_JMX_PORT: 5555 ZOOKEEPER_TICK_TIME: 2000 ZOOKEEPER_SYNC_LIMIT: 5 ZOOKEEPER_INIT_LIMIT: 10 ZOOKEEPER_MAX_CLIENT_CNXNS: 60 ZOOKEEPER_AUTOPURGE_SNAP_RETAIN_COUNT: 3 ZOOKEEPER_AUTOPURGE_PURGE_INTERVAL: 24 ZOOKEEPER_CLIENT_PORT: 2181 ZOOKEEPER_SERVERS: my-confluent-oss-cp-zookeeper-0.my-confluent-oss-cp-zookeeper-headless.default:2888:3888;my-confluent-oss-cp-zookeeper-1.my-confluent-oss-cp-zookeeper-headless.default:2888:3888;my-confluent-oss-cp-zookeeper-2.my-confluent-oss-cp-zookeeper-headless.default:2888:3888 ZOOKEEPER_SERVER_ID: my-confluent-oss-cp-zookeeper-0 (v1:metadata.name) Mounts: /var/lib/zookeeper/data from datadir (rw) /var/lib/zookeeper/log from datalogdir (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-6zwz4 (ro) Conditions: Type Status PodScheduled False Volumes: datadir: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: datadir-my-confluent-oss-cp-zookeeper-0 ReadOnly: false datalogdir: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: datalogdir-my-confluent-oss-cp-zookeeper-0 ReadOnly: false jmx-config: Type: ConfigMap (a volume populated by a ConfigMap) Name: my-confluent-oss-cp-zookeeper-jmx-configmap Optional: false default-token-6zwz4: Type: Secret (a volume populated by a Secret) SecretName: default-token-6zwz4 Optional: false QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message


Warning FailedScheduling 2d17h (x3 over 2d17h) default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 2 times) Warning FailedScheduling 10m (x3 over 57m) default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 2 times) Warning FailedScheduling 3m16s default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 2 times) [root@master data]#

here is my Persistent storage configuration:

[root@master ~]# more pv-volume.yaml kind: PersistentVolume apiVersion: v1 metadata: name: task-pv-volume labels: type: local spec: storageClassName: manual capacity: storage: 25Gi accessModes:

[root@master ~]# kubectl create -f pv-volume.yaml persistentvolume/task-pv-volume created [root@master ~]# kubectl get pv task-pv-volume NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE task-pv-volume 25Gi RWO Retain Available manual 20s

My kubernetes skills still new , so really appreciate your insights.

mekaam commented 5 years ago

when checking the pvc all is pending [root@master ~]# kubectl get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE task-pv-volume 25Gi RWO Retain Available manual 13s [root@master ~]# kubectl get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE datadir-0-my-confluent-oss-cp-kafka-0 Pending 2d19h datadir-my-confluent-oss-cp-zookeeper-0 Pending 2d19h datalogdir-my-confluent-oss-cp-zookeeper-0 Pending 2d19h

[root@master ~]# kubectl describe pvc datadir-0-my-confluent-oss-cp-kafka-0 Name: datadir-0-my-confluent-oss-cp-kafka-0 Namespace: default StorageClass: Status: Pending Volume: Labels: app=cp-kafka release=my-confluent-oss Annotations: Finalizers: [kubernetes.io/pvc-protection] Capacity: Access Modes: VolumeMode: Filesystem Events: Type Reason Age From Message


Normal FailedBinding 2d19h (x24 over 2d19h) persistentvolume-controller no persistent volumes available for this claim and no storag e class is set Normal FailedBinding 117m (x201 over 167m) persistentvolume-controller no persistent volumes available for this claim and no storag e class is set Normal FailedBinding 98m (x61 over 113m) persistentvolume-controller no persistent volumes available for this claim and no storag e class is set Normal FailedBinding 4m20s (x101 over 29m) persistentvolume-controller no persistent volumes available for this claim and no storag e class is set Mounted By: my-confluent-oss-cp-kafka-0 [root@master ~]#

[root@master ~]# kubectl describe pv task-pv-volume Name: task-pv-volume Labels: type=local Annotations: Finalizers: [kubernetes.io/pv-protection] StorageClass: manual Status: Available Claim: Reclaim Policy: Retain Access Modes: RWO VolumeMode: Filesystem Capacity: 25Gi Node Affinity: Message: Source: Type: HostPath (bare host directory volume) Path: /mnt/data HostPathType: Events:

andyrhee commented 5 years ago

Looks like a storageClass issue. What do you get when you run kubectl get storageClass manual -o yaml? Are you installing this on AWS or other clouds? If so, kubectl get storageClass should show at least the default one.

mekaam commented 5 years ago

thanks a lot Andyrhee [root@master ~]# kubectl get storageClass No resources found. this is a cluster created on google cloud VMs but through kubeadm

andyrhee commented 5 years ago

Hi @amekawy-eu ,

Sorry, I got sick and couldn't get back to you quickly. I'm not using Google cloud, but I think you can use one of the gke examples here: https://github.com/Yolean/kubernetes-kafka/tree/master/configure

For example, run the following command to create an sc named kafka-broker:

kubectl create -f configure/gke-storageclass-broker-pd.yml

Then use the sc name (kafka-broker) as the value for StorageClass in the values.yaml file, etc.

Hope this helps!

thanks a lot Andyrhee [root@master ~]# kubectl get storageClass No resources found. this is a cluster created on google cloud VMs but through kubeadm

mekaam commented 5 years ago

sorry @Andyrhee for taking long, i hope you recover i am working on it and will update as currently facing issues with the cluster thanks a lot

mekaam commented 5 years ago

thanks a lot @andyrhee, it worked pretty good 👍 thanks for your great help