glustercli commands are failing because of an unhealthy gluster pod

PrasadDesala commented 5 years ago

glustercli commands are failing because of an unhealthy gluster pod.

[root@gluster-kube1-0 glusterd2]# glustercli peer list Failed to get Peers list

Response headers: X-Gluster-Peer-Id: 0aea396e-4401-4bd7-9b40-66662b521112 X-Request-Id: cd7311f2-073b-48e8-98c8-f1f488fcdbed X-Gluster-Cluster-Id: ef45b7f7-9d59-47cc-b1cb-cef3c643cb97

Response body: context deadline exceeded [root@gluster-kube1-0 glusterd2]# glustercli volume list Error getting volumes list Get http://gluster-kube1-0.glusterd2.gcs:24007/v1/volumes: net/http: request canceled (Client.Timeout exceeded while awaiting headers)

[vagrant@kube1 ~]$ kubectl -n gcs get pods NAME READY STATUS RESTARTS AGE alertmanager-alert-0 2/2 Running 0 19h alertmanager-alert-1 2/2 Running 0 19h anthill-58b9b9b6f-lcthr 1/1 Running 0 19h csi-attacher-glusterfsplugin-0 2/2 Running 0 19h csi-nodeplugin-glusterfsplugin-5t8wz 2/2 Running 0 19h csi-nodeplugin-glusterfsplugin-7hrnl 2/2 Running 0 19h csi-nodeplugin-glusterfsplugin-nblhg 2/2 Running 0 19h csi-provisioner-glusterfsplugin-0 4/4 Running 1 19h etcd-8pfnbtgtn4 0/1 Running 0 14h etcd-jd6sh9j497 0/1 Completed 0 19h etcd-operator-77bfcd6595-pbvsf 1/1 Running 1 19h etcd-qktc7ckpd4 0/1 Completed 0 19h gluster-kube1-0 1/1 Running 2 19h gluster-kube2-0 1/1 Running 199 19h gluster-kube3-0 1/1 Running 3 19h gluster-mixins-88b4k 0/1 Completed 0 19h grafana-9df95dfb5-zsgkv 1/1 Running 0 19h kube-state-metrics-86bc74fd4c-t7j2b 4/4 Running 0 19h node-exporter-2vbvn 2/2 Running 0 19h node-exporter-dnpmh 2/2 Running 0 19h node-exporter-lkpqs 2/2 Running 0 19h prometheus-operator-6c4b6cfc76-dq448 1/1 Running 0 19h prometheus-prometheus-0 2/3 Running 10 19h prometheus-prometheus-1 3/3 Running 2 19h

[vagrant@kube1 ~]$ kubectl -n gcs describe pods gluster-kube2-0 Name: gluster-kube2-0 Namespace: gcs Priority: 0 PriorityClassName: Node: kube2/192.168.121.18 Start Time: Mon, 21 Jan 2019 11:20:34 +0000 Labels: app.kubernetes.io/component=glusterfs app.kubernetes.io/name=glusterd2 app.kubernetes.io/part-of=gcs controller-revision-hash=gluster-kube2-64b5bf4cc4 statefulset.kubernetes.io/pod-name=gluster-kube2-0 Annotations: Status: Running IP: 10.233.65.5 Controlled By: StatefulSet/gluster-kube2 Containers: glusterd2: Container ID: docker://c4d9230e69653a47b230477b571c2d6e481ebdec64cede390daf8bed01c67418 Image: docker.io/gluster/glusterd2-nightly Image ID: docker-pullable://docker.io/gluster/glusterd2-nightly@sha256:0bfea4b75288dc269f34648397e2d837f2a7b5aec71ec3c190d5856de41d55a8 Port: Host Port: State: Running Started: Tue, 22 Jan 2019 06:36:10 +0000 Ready: True Restart Count: 199 Liveness: http-get http://:24007/ping delay=10s timeout=1s period=60s #success=1 #failure=3 Environment: GD2_ETCDENDPOINTS: http://etcd-client.gcs:2379 GD2_CLUSTER_ID: ef45b7f7-9d59-47cc-b1cb-cef3c643cb97 GD2_CLIENTADDRESS: gluster-kube2-0.glusterd2.gcs:24007 GD2_ENDPOINTS: http://gluster-kube2-0.glusterd2.gcs:24007 GD2_PEERADDRESS: gluster-kube2-0.glusterd2.gcs:24008 GD2_RESTAUTH: false Mounts: /dev from gluster-dev (rw) /run/lvm from gluster-lvm (rw) /sys/fs/cgroup from gluster-cgroup (ro) /usr/lib/modules from gluster-kmods (ro) /var/lib/glusterd2 from glusterd2-statedir (rw) /var/log/glusterd2 from glusterd2-logdir (rw) /var/run/secrets/kubernetes.io/serviceaccount from default-token-66gxg (ro) Conditions: Type Status Initialized True Ready True ContainersReady True PodScheduled True Volumes: gluster-dev: Type: HostPath (bare host directory volume) Path: /dev HostPathType:
gluster-cgroup: Type: HostPath (bare host directory volume) Path: /sys/fs/cgroup HostPathType:
gluster-lvm: Type: HostPath (bare host directory volume) Path: /run/lvm HostPathType:
gluster-kmods: Type: HostPath (bare host directory volume) Path: /usr/lib/modules HostPathType:
glusterd2-statedir: Type: HostPath (bare host directory volume) Path: /var/lib/glusterd2 HostPathType: DirectoryOrCreate glusterd2-logdir: Type: HostPath (bare host directory volume) Path: /var/log/glusterd2 HostPathType: DirectoryOrCreate default-token-66gxg: Type: Secret (a volume populated by a Secret) SecretName: default-token-66gxg Optional: false QoS Class: BestEffort Node-Selectors: Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s node.kubernetes.io/unreachable:NoExecute for 300s Events: Type Reason Age From Message

Warning Unhealthy 2m3s (x589 over 9h) kubelet, kube2 Liveness probe failed: Get http://10.233.65.5:24007/ping: dial tcp 10.233.65.5:24007: connect: connection refused

glusterd version: v6.0-dev.114.gitd51f60b

Attached: csi-provisioner and gluster-provisioner logs gluster-provisioner-logs.txt csi-provisioner logs.txt

PrasadDesala commented 5 years ago

Steps performed:

1) Create a 3 node GCS setup using vagrant. 2) Create 500 PVCs (sequential). 3) Start delete 500 PVCs (sequential). However,only pvcs are deleted, pv and gluster volumes in backend are not deleted. Leave the system for sometime(more than 2 hrs) and run glustercli commands.

JohnStrunk commented 5 years ago

Problem is with etcd. You've lost 2 of the 3 pods:

etcd-8pfnbtgtn4 0/1 Running 0 14h
etcd-jd6sh9j497 0/1 Completed 0 19h
etcd-operator-77bfcd6595-pbvsf 1/1 Running 1 19h
etcd-qktc7ckpd4 0/1 Completed 0 19h

gluster / gcs

glustercli commands are failing because of an unhealthy gluster pod #120