Open PrasadDesala opened 5 years ago
Yes, that can certainly happen when running from master. Master intentionally uses "latest" (i.e., master) from all the associated repos. For releases, we lock down all container versions. As an example, here is the commit for the 0.5 release: https://github.com/gluster/gcs/commit/a55daa26d1f76125eecda1c74cd6bd56c166182c
For your master deployment, you have two options... You can lock to a particular hash like we do for releases, or you may be able to get away with imagePullPolicy: IfNotPresent
, but this would assume the pod is always restarting on the same node (only true for gd2 and some CSI containers).
I would recommend using the hash. To find the hash:
$ skopeo inspect docker://docker.io/gluster/glusterd2-nightly
{
"Name": "docker.io/gluster/glusterd2-nightly",
"Digest": "sha256:4b043cd167317d8c8afe46f049df993f61e191b41cbd6da5a07348328fd8a080",
"RepoTags": [
"20180814",
"20180821",
"20180907",
...
or
$ docker pull gluster/glusterd2-nightly
...
$ docker inspect glusterd2-nightly
...
"RepoDigests": [
"gluster/glusterd2-nightly@sha256:4b043cd167317d8c8afe46f049df993f61e191b41cbd6da5a07348328fd8a080"
],
...
I have kept the system ideal for more than 2 days and today when I logged in I am seeing that gluster pods are going to Pending state and are being recreated.
Container respin in this case might lead to version mismatch (as we are pulling from nightly builds). Let's say I have 3 gluster pods g1,g2 and g3 which are running on gd2 version: 109 and if suppose have started a test which takes 3 days to complete. On the second day because of some reasons if the gluster pod gets recreated and if a new gd2 build is available on master, the image it is pulling will have the latest gd2 version and the container gets created with latest gd2 version. Here there will be a version mismatch b/w g1,g2 (old gd2 version) and g3(latest gd2 version).
[vagrant@kube1 ~]$ kubectl -n gcs describe pods gluster-kube3-0 Name: gluster-kube3-0 Namespace: gcs Priority: 0 PriorityClassName:
Node:
Labels: app.kubernetes.io/component=glusterfs
app.kubernetes.io/name=glusterd2
app.kubernetes.io/part-of=gcs
controller-revision-hash=gluster-kube3-6875db4b7d
statefulset.kubernetes.io/pod-name=gluster-kube3-0
Annotations:
Status: Pending
IP:
Host Port:
Liveness: http-get http://:24007/ping delay=10s timeout=1s period=60s #success=1 #failure=3
Environment:
GD2_ETCDENDPOINTS: http://etcd-client.gcs:2379
GD2_CLUSTER_ID: 528b67d6-5a68-496b-9983-ed10037a5c5d
GD2_CLIENTADDRESS: gluster-kube3-0.glusterd2.gcs:24007
GD2_ENDPOINTS: http://gluster-kube3-0.glusterd2.gcs:24007
GD2_PEERADDRESS: gluster-kube3-0.glusterd2.gcs:24008
GD2_RESTAUTH: false
Mounts:
/dev from gluster-dev (rw)
/run/lvm from gluster-lvm (rw)
/sys/fs/cgroup from gluster-cgroup (ro)
/usr/lib/modules from gluster-kmods (ro)
/var/lib/glusterd2 from glusterd2-statedir (rw)
/var/log/glusterd2 from glusterd2-logdir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-bzw4r (ro)
Conditions:
Type Status
PodScheduled False
Volumes:
gluster-dev:
Type: HostPath (bare host directory volume)
Path: /dev
HostPathType:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
Controlled By: StatefulSet/gluster-kube3 Containers: glusterd2: Image: docker.io/gluster/glusterd2-nightly Port:
gluster-cgroup: Type: HostPath (bare host directory volume) Path: /sys/fs/cgroup HostPathType:
gluster-lvm: Type: HostPath (bare host directory volume) Path: /run/lvm HostPathType:
gluster-kmods: Type: HostPath (bare host directory volume) Path: /usr/lib/modules HostPathType:
glusterd2-statedir: Type: HostPath (bare host directory volume) Path: /var/lib/glusterd2 HostPathType: DirectoryOrCreate glusterd2-logdir: Type: HostPath (bare host directory volume) Path: /var/log/glusterd2 HostPathType: DirectoryOrCreate default-token-bzw4r: Type: Secret (a volume populated by a Secret) SecretName: default-token-bzw4r Optional: false QoS Class: BestEffort Node-Selectors:
Warning FailedScheduling 5m8s (x2 over 5m8s) default-scheduler 0/3 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 node(s) didn't match node selector.
After some time the image is pulled and recreated,
[vagrant@kube1 ~]$ kubectl -n gcs describe pods gluster-kube3-0 Name: gluster-kube3-0 Namespace: gcs Priority: 0 PriorityClassName:
Node: kube3/192.168.121.23
Start Time: Mon, 21 Jan 2019 07:06:59 +0000
Labels: app.kubernetes.io/component=glusterfs
app.kubernetes.io/name=glusterd2
app.kubernetes.io/part-of=gcs
controller-revision-hash=gluster-kube3-6875db4b7d
statefulset.kubernetes.io/pod-name=gluster-kube3-0
Annotations:
Status: Running
IP: 10.233.66.49
Controlled By: StatefulSet/gluster-kube3
Containers:
glusterd2:
Container ID: docker://644eb58388afed6c22ae9ba77f144479ff99cf633d9aef4d48a2a6efa87e4217
Image: docker.io/gluster/glusterd2-nightly
Image ID: docker-pullable://docker.io/gluster/glusterd2-nightly@sha256:cedb774a540c80917366dd0e21eb2fc6eea6efe56697a90e5720fdd4004853bf
Port:
Host Port:
State: Running
Started: Mon, 21 Jan 2019 07:07:30 +0000
Ready: True
Restart Count: 0
Liveness: http-get http://:24007/ping delay=10s timeout=1s period=60s #success=1 #failure=3
Environment:
GD2_ETCDENDPOINTS: http://etcd-client.gcs:2379
GD2_CLUSTER_ID: 528b67d6-5a68-496b-9983-ed10037a5c5d
GD2_CLIENTADDRESS: gluster-kube3-0.glusterd2.gcs:24007
GD2_ENDPOINTS: http://gluster-kube3-0.glusterd2.gcs:24007
GD2_PEERADDRESS: gluster-kube3-0.glusterd2.gcs:24008
GD2_RESTAUTH: false
Mounts:
/dev from gluster-dev (rw)
/run/lvm from gluster-lvm (rw)
/sys/fs/cgroup from gluster-cgroup (ro)
/usr/lib/modules from gluster-kmods (ro)
/var/lib/glusterd2 from glusterd2-statedir (rw)
/var/log/glusterd2 from glusterd2-logdir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from default-token-bzw4r (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
gluster-dev:
Type: HostPath (bare host directory volume)
Path: /dev
HostPathType:
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
gluster-cgroup: Type: HostPath (bare host directory volume) Path: /sys/fs/cgroup HostPathType:
gluster-lvm: Type: HostPath (bare host directory volume) Path: /run/lvm HostPathType:
gluster-kmods: Type: HostPath (bare host directory volume) Path: /usr/lib/modules HostPathType:
glusterd2-statedir: Type: HostPath (bare host directory volume) Path: /var/lib/glusterd2 HostPathType: DirectoryOrCreate glusterd2-logdir: Type: HostPath (bare host directory volume) Path: /var/log/glusterd2 HostPathType: DirectoryOrCreate default-token-bzw4r: Type: Secret (a volume populated by a Secret) SecretName: default-token-bzw4r Optional: false QoS Class: BestEffort Node-Selectors:
Warning FailedScheduling 41s (x19 over 6m21s) default-scheduler 0/3 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 2 node(s) didn't match node selector. Normal Scheduled 41s default-scheduler Successfully assigned gcs/gluster-kube3-0 to kube3 Normal Pulling 30s kubelet, kube3 pulling image "docker.io/gluster/glusterd2-nightly" Normal Pulled 10s kubelet, kube3 Successfully pulled image "docker.io/gluster/glusterd2-nightly" Normal Created 10s kubelet, kube3 Created container Normal Started 10s kubelet, kube3 Started container
[vagrant@kube1 ~]$ df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/atomicos-root 27G 19G 8.3G 70% / [vagrant@kube2 ~]$ df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/atomicos-root 27G 22G 5.3G 81% / [vagrant@kube3 ~]$ df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/atomicos-root 27G 23G 4.2G 85% /
[vagrant@kube1 ~]$ free -h total used free shared buff/cache available Mem: 31G 25G 1.4G 27M 4.3G 4.4G Swap: 0B 0B 0B [vagrant@kube2 ~]$ free -h total used free shared buff/cache available Mem: 31G 10G 3.7G 26M 16G 19G Swap: 0B 0B 0B [vagrant@kube3 ~]$ free -h total used free shared buff/cache available Mem: 31G 1.4G 21G 23M 8.6G 28G Swap: 0B 0B 0B