Open PrasadDesala opened 5 years ago
Steps performed:
1) Create a 3 node GCS setup using vagrant. 2) Create 500 PVCs (sequential). 3) Start delete 500 PVCs (sequential). However,only pvcs are deleted, pv and gluster volumes in backend are not deleted. Leave the system for sometime(more than 2 hrs) and run glustercli commands.
Problem is with etcd. You've lost 2 of the 3 pods:
etcd-8pfnbtgtn4 0/1 Running 0 14h
etcd-jd6sh9j497 0/1 Completed 0 19h
etcd-operator-77bfcd6595-pbvsf 1/1 Running 1 19h
etcd-qktc7ckpd4 0/1 Completed 0 19h
glustercli commands are failing because of an unhealthy gluster pod.
[root@gluster-kube1-0 glusterd2]# glustercli peer list Failed to get Peers list
Response headers: X-Gluster-Peer-Id: 0aea396e-4401-4bd7-9b40-66662b521112 X-Request-Id: cd7311f2-073b-48e8-98c8-f1f488fcdbed X-Gluster-Cluster-Id: ef45b7f7-9d59-47cc-b1cb-cef3c643cb97
Response body: context deadline exceeded [root@gluster-kube1-0 glusterd2]# glustercli volume list Error getting volumes list Get http://gluster-kube1-0.glusterd2.gcs:24007/v1/volumes: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
[vagrant@kube1 ~]$ kubectl -n gcs get pods NAME READY STATUS RESTARTS AGE alertmanager-alert-0 2/2 Running 0 19h alertmanager-alert-1 2/2 Running 0 19h anthill-58b9b9b6f-lcthr 1/1 Running 0 19h csi-attacher-glusterfsplugin-0 2/2 Running 0 19h csi-nodeplugin-glusterfsplugin-5t8wz 2/2 Running 0 19h csi-nodeplugin-glusterfsplugin-7hrnl 2/2 Running 0 19h csi-nodeplugin-glusterfsplugin-nblhg 2/2 Running 0 19h csi-provisioner-glusterfsplugin-0 4/4 Running 1 19h etcd-8pfnbtgtn4 0/1 Running 0 14h etcd-jd6sh9j497 0/1 Completed 0 19h etcd-operator-77bfcd6595-pbvsf 1/1 Running 1 19h etcd-qktc7ckpd4 0/1 Completed 0 19h gluster-kube1-0 1/1 Running 2 19h gluster-kube2-0 1/1 Running 199 19h gluster-kube3-0 1/1 Running 3 19h gluster-mixins-88b4k 0/1 Completed 0 19h grafana-9df95dfb5-zsgkv 1/1 Running 0 19h kube-state-metrics-86bc74fd4c-t7j2b 4/4 Running 0 19h node-exporter-2vbvn 2/2 Running 0 19h node-exporter-dnpmh 2/2 Running 0 19h node-exporter-lkpqs 2/2 Running 0 19h prometheus-operator-6c4b6cfc76-dq448 1/1 Running 0 19h prometheus-prometheus-0 2/3 Running 10 19h prometheus-prometheus-1 3/3 Running 2 19h
[vagrant@kube1 ~]$ kubectl -n gcs describe pods gluster-kube2-0 Name: gluster-kube2-0 Namespace: gcs Priority: 0 PriorityClassName:
Node: kube2/192.168.121.18
Start Time: Mon, 21 Jan 2019 11:20:34 +0000
Labels: app.kubernetes.io/component=glusterfs
app.kubernetes.io/name=glusterd2
app.kubernetes.io/part-of=gcs
controller-revision-hash=gluster-kube2-64b5bf4cc4
statefulset.kubernetes.io/pod-name=gluster-kube2-0
Annotations:
Status: Running
IP: 10.233.65.5
Controlled By: StatefulSet/gluster-kube2
Containers:
glusterd2:
Container ID: docker://c4d9230e69653a47b230477b571c2d6e481ebdec64cede390daf8bed01c67418
Image: docker.io/gluster/glusterd2-nightly
Image ID: docker-pullable://docker.io/gluster/glusterd2-nightly@sha256:0bfea4b75288dc269f34648397e2d837f2a7b5aec71ec3c190d5856de41d55a8
Port:
Host Port:
State: Running
Started: Tue, 22 Jan 2019 06:36:10 +0000
Ready: True
Restart Count: 199
Liveness: http-get http://:24007/ping delay=10s timeout=1s period=60s #success=1 #failure=3
Environment:
GD2_ETCDENDPOINTS: http://etcd-client.gcs:2379
GD2_CLUSTER_ID: ef45b7f7-9d59-47cc-b1cb-cef3c643cb97
GD2_CLIENTADDRESS: gluster-kube2-0.glusterd2.gcs:24007
GD2_ENDPOINTS: http://gluster-kube2-0.glusterd2.gcs:24007
GD2_PEERADDRESS: gluster-kube2-0.glusterd2.gcs:24008
GD2_RESTAUTH: false
Mounts:
/dev from gluster-dev (rw)
/run/lvm from gluster-lvm (rw)
/sys/fs/cgroup from gluster-cgroup (ro)
/usr/lib/modules from gluster-kmods (ro)
/var/lib/glusterd2 from glusterd2-statedir (rw)
/var/log/glusterd2 from glusterd2-logdir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-66gxg (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
gluster-dev:
Type: HostPath (bare host directory volume)
Path: /dev
HostPathType:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
gluster-cgroup: Type: HostPath (bare host directory volume) Path: /sys/fs/cgroup HostPathType:
gluster-lvm: Type: HostPath (bare host directory volume) Path: /run/lvm HostPathType:
gluster-kmods: Type: HostPath (bare host directory volume) Path: /usr/lib/modules HostPathType:
glusterd2-statedir: Type: HostPath (bare host directory volume) Path: /var/lib/glusterd2 HostPathType: DirectoryOrCreate glusterd2-logdir: Type: HostPath (bare host directory volume) Path: /var/log/glusterd2 HostPathType: DirectoryOrCreate default-token-66gxg: Type: Secret (a volume populated by a Secret) SecretName: default-token-66gxg Optional: false QoS Class: BestEffort Node-Selectors:
Warning Unhealthy 2m3s (x589 over 9h) kubelet, kube2 Liveness probe failed: Get http://10.233.65.5:24007/ping: dial tcp 10.233.65.5:24007: connect: connection refused
glusterd version: v6.0-dev.114.gitd51f60b
Attached: csi-provisioner and gluster-provisioner logs gluster-provisioner-logs.txt csi-provisioner logs.txt