Worker nodes' status changes to `NotReady`

jihoon-seo commented 3 years ago

What happened : Worker nodes' status remains NotReady.

❯ kubectl get nodes
NAME                     STATUS     ROLES    AGE    VERSION
cb-sjh1-sjh1-w-1-4k340   NotReady   <none>   3h9m   v1.18.9 (VM on GCP)
ip-192-168-1-96          Ready      master   3h9m   v1.18.9 (VM on AWS)

What you expected to happen :

How to reproduce it (as minimally and precisely as possible) :

로컬 개발 환경에 minikube K8s 클러스터 생성
cb-operator로 Cloud-Barista 실행
실행된 Cloud-Barista 의 CB-Ladybug을 이용하여 AWS+GCP에 K8s 클러스터 생성 ("인셉션")

Anything else we need to know? :

Environment

Source version or branch:
OS:
Others:

Proposed solution :

Any other context :

vlatte commented 3 years ago

위의 실행환경에서만 발생하는 현상인건지 확인이 필요할 것 같습니다. 혹시 다른 참고할만한 건 없을까요? cb-ladybug 로그나, pod 현황 (kubectl get pod -A) 같은거요.

jihoon-seo commented 3 years ago

@vlatte 해당 사항을 살펴 보았습니다. (이 현상이 1회성인 것일 수도 있습니다.)

❯ kubectl get nodes
NAME                     STATUS     ROLES    AGE   VERSION
cb-sjh1-sjh1-w-1-4k340   NotReady   <none>   46h   v1.18.9
ip-192-168-1-96          Ready      master   46h   v1.18.9

[CB-Ladybug 로그]

I0803 03:14:09.506093   18315 initconfiguration.go:200] loading configuration from "kubeadm-config.yaml"
I0803 03:14:09.507413   18315 initconfiguration.go:103] detected and using CRI socket: /var/run/dockershim.sock
I0803 03:14:09.507670   18315 interface.go:400] Looking for default routes with IPv4 addresses
I0803 03:14:09.507682   18315 interface.go:405] Default route transits interface "eth0"
I0803 03:14:09.507818   18315 interface.go:208] Interface eth0 is up
I0803 03:14:09.507873   18315 interface.go:256] Interface "eth0" has 3 addresses :[192.168.1.96/24 3.112.22.175/32 fe80::407:3eff:fe60:b0d3/64].
I0803 03:14:09.507889   18315 interface.go:223] Checking addr  192.168.1.96/24.
I0803 03:14:09.507897   18315 interface.go:230] IP found 192.168.1.96
I0803 03:14:09.507904   18315 interface.go:262] Found valid IPv4 address 192.168.1.96 for interface "eth0".
I0803 03:14:09.507909   18315 interface.go:411] Found active IP 192.168.1.96 
I0803 03:14:09.507944   18315 version.go:183] fetching Kubernetes version from URL: https://dl.k8s.io/release/stable-1.txt
I0803 03:14:10.040089   18315 version.go:252] remote version is much newer: v1.21.3; falling back to: stable-1.18
I0803 03:14:10.040132   18315 version.go:183] fetching Kubernetes version from URL: https://dl.k8s.io/release/stable-1.18.txt
W0803 03:14:10.385524   18315 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
I0803 03:14:10.385774   18315 checks.go:577] validating Kubernetes and kubeadm version
I0803 03:14:10.385804   18315 checks.go:166] validating if the firewall is enabled and active
I0803 03:14:10.395183   18315 checks.go:201] validating availability of port 6443
I0803 03:14:10.395348   18315 checks.go:201] validating availability of port 10259
I0803 03:14:10.395386   18315 checks.go:201] validating availability of port 10257
I0803 03:14:10.395412   18315 checks.go:286] validating the existence of file /etc/kubernetes/manifests/kube-apiserver.yaml
I0803 03:14:10.395438   18315 checks.go:286] validating the existence of file /etc/kubernetes/manifests/kube-controller-manager.yaml
I0803 03:14:10.395448   18315 checks.go:286] validating the existence of file /etc/kubernetes/manifests/kube-scheduler.yaml
I0803 03:14:10.395456   18315 checks.go:286] validating the existence of file /etc/kubernetes/manifests/etcd.yaml
I0803 03:14:10.395466   18315 checks.go:432] validating if the connectivity type is via proxy or direct
I0803 03:14:10.395488   18315 checks.go:471] validating http connectivity to first IP address in the CIDR
I0803 03:14:10.395505   18315 checks.go:471] validating http connectivity to first IP address in the CIDR
I0803 03:14:10.395525   18315 checks.go:102] validating the container runtime
I0803 03:14:10.481009   18315 checks.go:128] validating if the service is enabled and active
I0803 03:14:10.583612   18315 checks.go:335] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables
I0803 03:14:10.583674   18315 checks.go:335] validating the contents of file /proc/sys/net/ipv4/ip_forward
I0803 03:14:10.583696   18315 checks.go:649] validating whether swap is enabled or not
I0803 03:14:10.583723   18315 checks.go:376] validating the presence of executable conntrack
I0803 03:14:10.583755   18315 checks.go:376] validating the presence of executable ip
I0803 03:14:10.583774   18315 checks.go:376] validating the presence of executable iptables
I0803 03:14:10.583791   18315 checks.go:376] validating the presence of executable mount
I0803 03:14:10.583809   18315 checks.go:376] validating the presence of executable nsenter
I0803 03:14:10.583853   18315 checks.go:376] validating the presence of executable ebtables
I0803 03:14:10.583877   18315 checks.go:376] validating the presence of executable ethtool
I0803 03:14:10.583914   18315 checks.go:376] validating the presence of executable socat
I0803 03:14:10.583936   18315 checks.go:376] validating the presence of executable tc
I0803 03:14:10.583962   18315 checks.go:376] validating the presence of executable touch
I0803 03:14:10.583985   18315 checks.go:520] running all checks
I0803 03:14:10.679622   18315 checks.go:406] checking whether the given node name is reachable using net.LookupHost
I0803 03:14:10.680655   18315 checks.go:618] validating kubelet version
I0803 03:14:10.741010   18315 checks.go:128] validating if the service is enabled and active
I0803 03:14:10.752762   18315 checks.go:201] validating availability of port 10250
I0803 03:14:10.752842   18315 checks.go:201] validating availability of port 2379
I0803 03:14:10.752869   18315 checks.go:201] validating availability of port 2380
I0803 03:14:10.752896   18315 checks.go:249] validating the existence and emptiness of directory /var/lib/etcd
I0803 03:14:10.810118   18315 checks.go:844] pulling k8s.gcr.io/kube-apiserver:v1.18.20
I0803 03:14:14.627388   18315 checks.go:844] pulling k8s.gcr.io/kube-controller-manager:v1.18.20
I0803 03:14:17.470977   18315 checks.go:844] pulling k8s.gcr.io/kube-scheduler:v1.18.20
I0803 03:14:19.677735   18315 checks.go:844] pulling k8s.gcr.io/kube-proxy:v1.18.20
I0803 03:14:25.446222   18315 checks.go:844] pulling k8s.gcr.io/pause:3.2
I0803 03:14:26.872368   18315 checks.go:844] pulling k8s.gcr.io/etcd:3.4.3-0
I0803 03:14:32.785877   18315 checks.go:844] pulling k8s.gcr.io/coredns:1.6.7
I0803 03:14:34.927136   18315 kubelet.go:64] Stopping the kubelet
I0803 03:14:35.188472   18315 certs.go:103] creating a new certificate authority for ca
I0803 03:14:35.875890   18315 certs.go:103] creating a new certificate authority for front-proxy-ca
I0803 03:14:36.257238   18315 certs.go:103] creating a new certificate authority for etcd-ca
I0803 03:14:37.406200   18315 certs.go:69] creating new public/private key files for signing service account users
I0803 03:14:37.495956   18315 kubeconfig.go:79] creating kubeconfig file for admin.conf
I0803 03:14:37.573174   18315 kubeconfig.go:79] creating kubeconfig file for kubelet.conf
I0803 03:14:38.107589   18315 kubeconfig.go:79] creating kubeconfig file for controller-manager.conf
I0803 03:14:38.275982   18315 kubeconfig.go:79] creating kubeconfig file for scheduler.conf
I0803 03:14:38.441379   18315 manifests.go:91] [control-plane] getting StaticPodSpecs
W0803 03:14:38.441504   18315 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
I0803 03:14:38.442080   18315 manifests.go:104] [control-plane] adding volume "ca-certs" for component "kube-apiserver"
I0803 03:14:38.442098   18315 manifests.go:104] [control-plane] adding volume "etc-ca-certificates" for component "kube-apiserver"
I0803 03:14:38.442104   18315 manifests.go:104] [control-plane] adding volume "k8s-certs" for component "kube-apiserver"
I0803 03:14:38.442110   18315 manifests.go:104] [control-plane] adding volume "usr-local-share-ca-certificates" for component "kube-apiserver"
I0803 03:14:38.442116   18315 manifests.go:104] [control-plane] adding volume "usr-share-ca-certificates" for component "kube-apiserver"
I0803 03:14:38.449756   18315 manifests.go:121] [control-plane] wrote static Pod manifest for component "kube-apiserver" to "/etc/kubernetes/manifests/kube-apiserver.yaml"
I0803 03:14:38.449790   18315 manifests.go:91] [control-plane] getting StaticPodSpecs
W0803 03:14:38.449852   18315 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
I0803 03:14:38.450080   18315 manifests.go:104] [control-plane] adding volume "ca-certs" for component "kube-controller-manager"
I0803 03:14:38.450098   18315 manifests.go:104] [control-plane] adding volume "etc-ca-certificates" for component "kube-controller-manager"
I0803 03:14:38.450104   18315 manifests.go:104] [control-plane] adding volume "flexvolume-dir" for component "kube-controller-manager"
I0803 03:14:38.450110   18315 manifests.go:104] [control-plane] adding volume "k8s-certs" for component "kube-controller-manager"
I0803 03:14:38.450116   18315 manifests.go:104] [control-plane] adding volume "kubeconfig" for component "kube-controller-manager"
I0803 03:14:38.450121   18315 manifests.go:104] [control-plane] adding volume "usr-local-share-ca-certificates" for component "kube-controller-manager"
I0803 03:14:38.450127   18315 manifests.go:104] [control-plane] adding volume "usr-share-ca-certificates" for component "kube-controller-manager"
I0803 03:14:38.450930   18315 manifests.go:121] [control-plane] wrote static Pod manifest for component "kube-controller-manager" to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
I0803 03:14:38.450955   18315 manifests.go:91] [control-plane] getting StaticPodSpecs
W0803 03:14:38.451008   18315 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
I0803 03:14:38.451212   18315 manifests.go:104] [control-plane] adding volume "kubeconfig" for component "kube-scheduler"
I0803 03:14:38.451714   18315 manifests.go:121] [control-plane] wrote static Pod manifest for component "kube-scheduler" to "/etc/kubernetes/manifests/kube-scheduler.yaml"
I0803 03:14:38.452381   18315 local.go:72] [etcd] wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/manifests/etcd.yaml"
I0803 03:14:38.452403   18315 waitcontrolplane.go:87] [wait-control-plane] Waiting for the API server to be healthy
I0803 03:14:38.454495   18315 request.go:907] Got a Retry-After 1s response for attempt 1 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:39.455323   18315 request.go:907] Got a Retry-After 1s response for attempt 2 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:40.456097   18315 request.go:907] Got a Retry-After 1s response for attempt 3 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:41.456915   18315 request.go:907] Got a Retry-After 1s response for attempt 4 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:42.457775   18315 request.go:907] Got a Retry-After 1s response for attempt 5 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:43.458570   18315 request.go:907] Got a Retry-After 1s response for attempt 6 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:44.459356   18315 request.go:907] Got a Retry-After 1s response for attempt 7 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:45.460174   18315 request.go:907] Got a Retry-After 1s response for attempt 8 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:46.460923   18315 request.go:907] Got a Retry-After 1s response for attempt 9 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:47.962491   18315 request.go:907] Got a Retry-After 1s response for attempt 1 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:48.963271   18315 request.go:907] Got a Retry-After 1s response for attempt 2 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:49.963983   18315 request.go:907] Got a Retry-After 1s response for attempt 3 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:50.964864   18315 request.go:907] Got a Retry-After 1s response for attempt 4 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:51.965679   18315 request.go:907] Got a Retry-After 1s response for attempt 5 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:52.966604   18315 request.go:907] Got a Retry-After 1s response for attempt 6 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:53.967495   18315 request.go:907] Got a Retry-After 1s response for attempt 7 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:56.964028   18315 uploadconfig.go:108] [upload-config] Uploading the kubeadm ClusterConfiguration to a ConfigMap
I0803 03:14:56.977918   18315 uploadconfig.go:122] [upload-config] Uploading the kubelet component config to a ConfigMap
I0803 03:14:56.987080   18315 uploadconfig.go:127] [upload-config] Preserving the CRISocket information for the control-plane node
I0803 03:14:56.987116   18315 patchnode.go:30] [patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "ip-192-168-1-96" as an annotation
I0803 03:14:58.042891   18315 clusterinfo.go:45] [bootstrap-token] loading admin kubeconfig
I0803 03:14:58.043370   18315 clusterinfo.go:53] [bootstrap-token] copying the cluster from admin.conf to the bootstrap kubeconfig
I0803 03:14:58.043640   18315 clusterinfo.go:65] [bootstrap-token] creating/updating ConfigMap in kube-public namespace
I0803 03:14:58.046163   18315 clusterinfo.go:79] creating the RBAC rules for exposing the cluster-info ConfigMap in the kube-public namespace
I0803 03:14:58.051133   18315 kubeletfinalize.go:88] [kubelet-finalize] Assuming that kubelet client certificate rotation is enabled: found "/var/lib/kubelet/pki/kubelet-client-current.pem"
I0803 03:14:58.051970   18315 kubeletfinalize.go:132] [kubelet-finalize] Restarting the kubelet to enable client certificate rotation
I0803 03:14:58.487468   18315 request.go:557] Throttling request took 184.695441ms, request: POST:https://3.112.22.175:9998/api/v1/namespaces/kube-system/serviceaccounts?timeout=10s
I0803 03:14:58.687551   18315 request.go:557] Throttling request took 167.608572ms, request: POST:https://3.112.22.175:9998/api/v1/namespaces/kube-system/services?timeout=10s
I0803 03:14:58.887549   18315 request.go:557] Throttling request took 190.130979ms, request: POST:https://3.112.22.175:9998/api/v1/namespaces/kube-system/serviceaccounts?timeout=10s
I0803 03:14:59.087544   18315 request.go:557] Throttling request took 194.755559ms, request: POST:https://3.112.22.175:9998/api/v1/namespaces/kube-system/configmaps?timeout=10s
time="2021-08-03T03:15:00Z" level=info msg="install networkCNI"
time="2021-08-03T03:15:02Z" level=info msg="end k8s init"
time="2021-08-03T03:15:02Z" level=info msg="start k8s join"
time="2021-08-03T03:15:02Z" level=info msg="worker join (vm=sjh1-w-1-4k340)"
W0803 03:15:03.904790   17758 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
time="2021-08-03T03:15:23Z" level=info msg="end k8s join"
time="2021-08-03T03:15:23Z" level=info msg="duration := 11m48.482540473s"
[끝]

[pod 현황 (kubectl get pod -A)] 제가 어제 CB를 실행했을 때에는 대부분의 파드가 Running 상태였는데 오늘 확인해 보니 대부분의 파드가 Pending 으로 바뀌어 있네요..

❯ kubectl get pods -A
NAMESPACE       NAME                                                     READY   STATUS    RESTARTS   AGE
cloud-barista   cb-dragonfly-58f48dc4-d66tx                              0/1     Pending   0          37h
cloud-barista   cb-dragonfly-influxdb-0                                  0/1     Pending   0          35h
cloud-barista   cb-dragonfly-kafka-0                                     0/1     Pending   0          35h
cloud-barista   cb-dragonfly-zookeeper-0                                 0/1     Pending   0          35h
cloud-barista   cb-ladybug-94474d4bf-nkjnl                               0/1     Pending   0          37h
cloud-barista   cb-restapigw-575cb4d7cb-t7jll                            0/1     Pending   0          37h
cloud-barista   cb-restapigw-influxdb-6d49bccc4d-jfsxn                   0/1     Pending   0          37h
cloud-barista   cb-restapigw-jaeger-agent-m8wz4                          1/1     Running   17         42h
cloud-barista   cb-restapigw-jaeger-collector-5f4456448f-wz8vk           0/1     Pending   0          37h
cloud-barista   cb-restapigw-jaeger-query-d7844c565-6dw2t                0/2     Pending   0          37h
cloud-barista   cb-spider-6457455678-kmdpw                               0/1     Pending   0          37h
cloud-barista   cb-tumblebug-5dd7d5756b-vz4fl                            0/1     Pending   0          37h
cloud-barista   cb-webtool-676847cfb7-9djfk                              0/1     Pending   0          37h
cloud-barista   cloud-barista-cassandra-0                                0/1     Pending   0          35h
cloud-barista   cloud-barista-dragonfly-kapacitor-59758758fc-8xbrw       0/1     Pending   0          37h
cloud-barista   cloud-barista-etcd-0                                     0/1     Pending   0          35h
cloud-barista   cloud-barista-grafana-6954497b57-hm4tt                   0/2     Pending   0          37h
cloud-barista   cloud-barista-kube-state-metrics-78558787c4-qkmc4        0/1     Pending   0          37h
cloud-barista   cloud-barista-prometheus-alertmanager-55487695d5-slz4x   0/2     Pending   0          37h
cloud-barista   cloud-barista-prometheus-node-exporter-cpk64             1/1     Running   0          42h
cloud-barista   cloud-barista-prometheus-pushgateway-7d965ccb7-59gzh     0/1     Pending   0          37h
cloud-barista   cloud-barista-prometheus-server-597455b56b-wxpqp         0/2     Pending   0          37h
kube-system     coredns-66bff467f8-mk7lx                                 1/1     Running   0          46h
kube-system     coredns-66bff467f8-nmvnt                                 1/1     Running   0          43h
kube-system     etcd-ip-192-168-1-96                                     1/1     Running   0          46h
kube-system     kilo-nkgqh                                               1/1     Running   0          46h
kube-system     kilo-trfgl                                               1/1     Running   0          46h
kube-system     kube-apiserver-ip-192-168-1-96                           1/1     Running   0          46h
kube-system     kube-controller-manager-ip-192-168-1-96                  1/1     Running   1          46h
kube-system     kube-flannel-ds-4s5s8                                    1/1     Running   0          46h
kube-system     kube-flannel-ds-vwn6h                                    1/1     Running   0          46h
kube-system     kube-proxy-tnxpx                                         1/1     Running   0          46h
kube-system     kube-proxy-vpfqr                                         1/1     Running   0          46h
kube-system     kube-scheduler-ip-192-168-1-96                           1/1     Running   1          46h
kube-system     metrics-server-77775f68b8-fvtjf                          0/1     Pending   0          37h

CB를 다시 띄워 보거나, MCKS K8s cluster를 다시 만들어 보는 등의 작업을 해 보겠습니다. 😊

jihoon-seo commented 3 years ago

파드들이 왜 Pending 인지 살펴 보았습니다.

❯ kubectl logs cb-tumblebug-5dd7d5756b-vz4fl -n cloud-barista        

[아무 것도 뜨지 않음]

❯ kubectl describe pod cb-tumblebug-5dd7d5756b-vz4fl -n cloud-barista

Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  36h   default-scheduler  0/2 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 1 node(s) had taint {node.kubernetes.io/unreachable: }, that the pod didn't tolerate.

컨트롤 플레인 노드에는 master 테인트가 있어서 스케줄되지 않았고 워커 노드에는 unreachable 이어서 스케줄되지 않았네요.. 🤔

그런데 또, cb-restapigw-jaeger-agent-m8wz4 파드와 cloud-barista-prometheus-node-exporter-cpk64 파드는 워커 노드 (cb-sjh1-sjh1-w-1-4k340) 에서 Running 상태 이구요.. 🤔

jihoon-seo commented 3 years ago

./operator remove 명령을 실행하여 Cloud-Barista Helm release를 uninstall하니

Pending 상태였던 것은 파드 목록에서 사라졌습니다.
Running 상태였던 것은 Terminating 상태로 바뀌었는데, 한참이 지나도 파드 목록에서 사라지지 않습니다.

MCKS K8s 클러스터가 사용 중에 고장난 것 같은 느낌이 듭니다. 🤔

MCKS K8s 클러스터를 삭제하고 다시 만들어 보겠습니다.

jihoon-seo commented 3 years ago

MCKS K8s 클러스터를 삭제하고 다시 만드니 Ready 로 뜹니다.

❯ kubectl get nodes
NAME                     STATUS   ROLES    AGE   VERSION
cb-sjh1-sjh1-w-1-rw7mc   Ready    <none>   42s   v1.18.9
ip-192-168-1-149         Ready    master   65s   v1.18.9

❯ kubectl get pods -A        
NAMESPACE       NAME                                                     READY   STATUS             RESTARTS   AGE
cloud-barista   cb-dragonfly-58f48dc4-6nzfx                              1/1     Running            0          10m
cloud-barista   cb-dragonfly-influxdb-0                                  1/1     Running            0          10m
cloud-barista   cb-dragonfly-kafka-0                                     1/1     Running            1          10m
cloud-barista   cb-dragonfly-zookeeper-0                                 1/1     Running            0          10m
cloud-barista   cb-ladybug-94474d4bf-b4rqh                               1/1     Running            0          10m
cloud-barista   cb-restapigw-575cb4d7cb-wwc6v                            1/1     Running            0          10m
cloud-barista   cb-restapigw-influxdb-6d49bccc4d-dd7d6                   1/1     Running            0          10m
cloud-barista   cb-restapigw-jaeger-agent-c9ztt                          1/1     Running            0          10m
cloud-barista   cb-restapigw-jaeger-cassandra-schema-kk8tx               0/1     Completed          0          10m
cloud-barista   cb-restapigw-jaeger-collector-5f4456448f-k6pv2           0/1     CrashLoopBackOff   4          10m
cloud-barista   cb-restapigw-jaeger-query-d7844c565-vcwqg                2/2     Running            5          10m
cloud-barista   cb-spider-6457455678-bg2dq                               1/1     Running            0          10m
cloud-barista   cb-tumblebug-5dd7d5756b-s959r                            1/1     Running            0          10m
cloud-barista   cb-webtool-676847cfb7-9k7kf                              1/1     Running            0          10m
cloud-barista   cloud-barista-cassandra-0                                0/1     Running            3          10m
cloud-barista   cloud-barista-cassandra-1                                0/1     CrashLoopBackOff   3          7m40s
cloud-barista   cloud-barista-cassandra-2                                0/1     CrashLoopBackOff   3          6m3s
cloud-barista   cloud-barista-dragonfly-kapacitor-59758758fc-6j8z5       1/1     Running            0          10m
cloud-barista   cloud-barista-etcd-0                                     1/1     Running            0          10m
cloud-barista   cloud-barista-grafana-6954497b57-rk927                   2/2     Running            0          10m
cloud-barista   cloud-barista-kube-state-metrics-78558787c4-hppdm        1/1     Running            0          10m
cloud-barista   cloud-barista-prometheus-alertmanager-55487695d5-mlxrb   2/2     Running            0          10m
cloud-barista   cloud-barista-prometheus-node-exporter-gznm6             1/1     Running            0          10m
cloud-barista   cloud-barista-prometheus-pushgateway-7d965ccb7-4wwp9     1/1     Running            0          10m
cloud-barista   cloud-barista-prometheus-server-597455b56b-7hzmb         2/2     Running            0          10m
kube-system     coredns-66bff467f8-bzwdl                                 1/1     Running            0          17m
kube-system     coredns-66bff467f8-vv599                                 1/1     Running            0          17m
kube-system     etcd-ip-192-168-1-149                                    1/1     Running            0          17m
kube-system     kilo-hbtk4                                               1/1     Running            0          17m
kube-system     kilo-rjlms                                               1/1     Running            0          17m
kube-system     kube-apiserver-ip-192-168-1-149                          1/1     Running            0          17m
kube-system     kube-controller-manager-ip-192-168-1-149                 1/1     Running            0          17m
kube-system     kube-flannel-ds-h26h7                                    1/1     Running            1          17m
kube-system     kube-flannel-ds-tkf54                                    1/1     Running            0          17m
kube-system     kube-proxy-67qrq                                         1/1     Running            0          17m
kube-system     kube-proxy-wc2hl                                         1/1     Running            0          17m
kube-system     kube-scheduler-ip-192-168-1-149                          1/1     Running            0          17m
kube-system     metrics-server-77775f68b8-dhqbg                          1/1     Running            0          10m

사용해 보다가, 다시 NotReady 로 바뀌면 댓글을 달도록 하겠습니다~~

jihoon-seo commented 3 years ago

지금 확인해 보니, 또 NotReady 로 바뀌어 있네요..

❯ kubectl get nodes
NAME                     STATUS     ROLES    AGE   VERSION
cb-sjh1-sjh1-w-1-rw7mc   NotReady   <none>   16h   v1.18.9
ip-192-168-1-149         Ready      master   16h   v1.18.9

❯ kubectl get pods -A -o wide
NAMESPACE       NAME                                                     READY   STATUS        RESTARTS   AGE   IP              NODE                     NOMINATED NODE   READINESS GATES
cloud-barista   cb-dragonfly-58f48dc4-6nzfx                              1/1     Terminating   0          16h   10.244.1.5      cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
cloud-barista   cb-dragonfly-58f48dc4-dv7z5                              0/1     Pending       0          8h    <none>          <none>                   <none>           <none>
cloud-barista   cb-dragonfly-influxdb-0                                  1/1     Terminating   0          16h   10.244.1.17     cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
cloud-barista   cb-dragonfly-kafka-0                                     1/1     Terminating   1          16h   10.244.1.16     cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
cloud-barista   cb-dragonfly-zookeeper-0                                 1/1     Terminating   0          16h   10.244.1.15     cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
cloud-barista   cb-ladybug-94474d4bf-b4rqh                               1/1     Terminating   0          16h   10.244.1.6      cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
cloud-barista   cb-ladybug-94474d4bf-nw8np                               0/1     Pending       0          8h    <none>          <none>                   <none>           <none>
cloud-barista   cb-restapigw-575cb4d7cb-rxpjq                            0/1     Pending       0          8h    <none>          <none>                   <none>           <none>
cloud-barista   cb-restapigw-575cb4d7cb-wwc6v                            1/1     Terminating   0          16h   10.244.1.10     cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
cloud-barista   cb-restapigw-influxdb-6d49bccc4d-dd7d6                   1/1     Terminating   0          16h   10.244.1.11     cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
cloud-barista   cb-restapigw-influxdb-6d49bccc4d-mbdnh                   0/1     Pending       0          8h    <none>          <none>                   <none>           <none>
cloud-barista   cb-restapigw-jaeger-agent-c9ztt                          1/1     Running       14         16h   10.244.1.3      cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
cloud-barista   cb-restapigw-jaeger-collector-5f4456448f-k6pv2           1/1     Terminating   57         16h   10.244.1.23     cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
cloud-barista   cb-restapigw-jaeger-collector-5f4456448f-vjt8n           0/1     Pending       0          8h    <none>          <none>                   <none>           <none>
cloud-barista   cb-restapigw-jaeger-query-d7844c565-nc8cj                0/2     Pending       0          8h    <none>          <none>                   <none>           <none>
cloud-barista   cb-restapigw-jaeger-query-d7844c565-vcwqg                2/2     Terminating   76         16h   10.244.1.4      cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
cloud-barista   cb-spider-6457455678-bg2dq                               1/1     Terminating   0          16h   10.244.1.24     cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
cloud-barista   cb-spider-6457455678-l2mnn                               0/1     Pending       0          8h    <none>          <none>                   <none>           <none>
cloud-barista   cb-tumblebug-5dd7d5756b-89b4b                            0/1     Pending       0          8h    <none>          <none>                   <none>           <none>
cloud-barista   cb-tumblebug-5dd7d5756b-s959r                            1/1     Terminating   0          16h   10.244.1.22     cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
cloud-barista   cb-webtool-676847cfb7-9k7kf                              1/1     Terminating   0          16h   10.244.1.8      cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
cloud-barista   cb-webtool-676847cfb7-jzc7b                              0/1     Pending       0          8h    <none>          <none>                   <none>           <none>
cloud-barista   cloud-barista-cassandra-0                                1/1     Terminating   65         16h   10.244.1.12     cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
cloud-barista   cloud-barista-cassandra-1                                0/1     Terminating   97         16h   10.244.1.25     cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
cloud-barista   cloud-barista-cassandra-2                                1/1     Terminating   94         16h   10.244.1.26     cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
cloud-barista   cloud-barista-dragonfly-kapacitor-59758758fc-6j8z5       1/1     Terminating   0          16h   10.244.1.13     cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
cloud-barista   cloud-barista-dragonfly-kapacitor-59758758fc-xrfhk       0/1     Pending       0          8h    <none>          <none>                   <none>           <none>
cloud-barista   cloud-barista-etcd-0                                     1/1     Terminating   0          16h   10.244.1.14     cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
cloud-barista   cloud-barista-grafana-6954497b57-rk927                   2/2     Terminating   0          16h   10.244.1.19     cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
cloud-barista   cloud-barista-grafana-6954497b57-tlnz5                   0/2     Pending       0          8h    <none>          <none>                   <none>           <none>
cloud-barista   cloud-barista-kube-state-metrics-78558787c4-5k76x        0/1     Pending       0          8h    <none>          <none>                   <none>           <none>
cloud-barista   cloud-barista-kube-state-metrics-78558787c4-hppdm        1/1     Terminating   0          16h   10.244.1.21     cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
cloud-barista   cloud-barista-prometheus-alertmanager-55487695d5-czzzt   0/2     Pending       0          8h    <none>          <none>                   <none>           <none>
cloud-barista   cloud-barista-prometheus-alertmanager-55487695d5-mlxrb   2/2     Terminating   0          16h   10.244.1.18     cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
cloud-barista   cloud-barista-prometheus-node-exporter-gznm6             1/1     Running       0          16h   34.64.147.45    cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
cloud-barista   cloud-barista-prometheus-pushgateway-7d965ccb7-4wwp9     1/1     Terminating   0          16h   10.244.1.9      cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
cloud-barista   cloud-barista-prometheus-pushgateway-7d965ccb7-vgwm8     0/1     Pending       0          8h    <none>          <none>                   <none>           <none>
cloud-barista   cloud-barista-prometheus-server-597455b56b-7hzmb         2/2     Terminating   0          16h   10.244.1.7      cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
cloud-barista   cloud-barista-prometheus-server-597455b56b-rqt78         0/2     Pending       0          8h    <none>          <none>                   <none>           <none>
kube-system     coredns-66bff467f8-bzwdl                                 1/1     Running       0          16h   10.244.0.2      ip-192-168-1-149         <none>           <none>
kube-system     coredns-66bff467f8-vv599                                 1/1     Running       0          16h   10.244.0.3      ip-192-168-1-149         <none>           <none>
kube-system     etcd-ip-192-168-1-149                                    1/1     Running       0          16h   54.168.70.136   ip-192-168-1-149         <none>           <none>
kube-system     kilo-hbtk4                                               1/1     Running       0          16h   54.168.70.136   ip-192-168-1-149         <none>           <none>
kube-system     kilo-rjlms                                               1/1     Running       0          16h   34.64.147.45    cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
kube-system     kube-apiserver-ip-192-168-1-149                          1/1     Running       0          16h   54.168.70.136   ip-192-168-1-149         <none>           <none>
kube-system     kube-controller-manager-ip-192-168-1-149                 1/1     Running       0          16h   54.168.70.136   ip-192-168-1-149         <none>           <none>
kube-system     kube-flannel-ds-h26h7                                    1/1     Running       1          16h   34.64.147.45    cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
kube-system     kube-flannel-ds-tkf54                                    1/1     Running       0          16h   54.168.70.136   ip-192-168-1-149         <none>           <none>
kube-system     kube-proxy-67qrq                                         1/1     Running       0          16h   34.64.147.45    cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
kube-system     kube-proxy-wc2hl                                         1/1     Running       0          16h   54.168.70.136   ip-192-168-1-149         <none>           <none>
kube-system     kube-scheduler-ip-192-168-1-149                          1/1     Running       0          16h   54.168.70.136   ip-192-168-1-149         <none>           <none>
kube-system     metrics-server-77775f68b8-dhqbg                          1/1     Terminating   0          16h   10.244.1.2      cb-sjh1-sjh1-w-1-rw7mc   <none>           <none>
kube-system     metrics-server-77775f68b8-f2549                          0/1     Pending       0          8h    <none>          <none>                   <none>           <none>

[Terminating 상태인 파드의 로그 조회] (잘 동작하는 것을 확인했던 파드였습니다.)

❯ kubectl logs cb-spider-6457455678-bg2dq -n cloud-barista                             
Error from server: Get https://34.64.147.45:10250/containerLogs/cloud-barista/cb-spider-6457455678-bg2dq/cb-spider: dial tcp 34.64.147.45:10250: i/o timeout

[Pending 상태인 파드의 로그 조회]

❯ kubectl logs cb-spider-6457455678-l2mnn -n cloud-barista

[아무 것도 뜨지 않음]

[Terminating 상태인 파드의 describe 조회]

❯ kubectl describe pod cb-spider-6457455678-bg2dq -n cloud-barista

    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 05 Aug 2021 17:57:11 +0900
      Finished:     Thu, 05 Aug 2021 17:57:11 +0900

Events:          <none>

[Pending 상태인 파드의 describe 조회]

❯ kubectl describe pod cb-spider-6457455678-l2mnn -n cloud-barista

Status:         Pending

Events:
  Type     Reason            Age                   From               Message
  ----     ------            ----                  ----               -------
  Warning  FailedScheduling  4m36s (x339 over 8h)  default-scheduler  0/2 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 1 node(s) had taint {node.kubernetes.io/unreachable: }, that the pod didn't tolerate.

왜 이런걸까요.. 🤔

jihoon-seo commented 3 years ago

워커 노드의 boot disk size가 기본적으로는 10GB (GCP) 인데, 이를 100GB로 늘리니 8일 이후에도 워커 노드가 Ready 상태를 유지합니다.

jihoon-seo commented 2 years ago

Related issue:

101

cloud-barista / cb-ladybug

Worker nodes' status changes to `NotReady` #79

101