Closed jihoon-seo closed 2 years ago
위의 실행환경에서만 발생하는 현상인건지 확인이 필요할 것 같습니다. 혹시 다른 참고할만한 건 없을까요? cb-ladybug 로그나, pod 현황 (kubectl get pod -A) 같은거요.
@vlatte 해당 사항을 살펴 보았습니다. (이 현상이 1회성인 것일 수도 있습니다.)
❯ kubectl get nodes
NAME STATUS ROLES AGE VERSION
cb-sjh1-sjh1-w-1-4k340 NotReady <none> 46h v1.18.9
ip-192-168-1-96 Ready master 46h v1.18.9
[CB-Ladybug 로그]
I0803 03:14:09.506093 18315 initconfiguration.go:200] loading configuration from "kubeadm-config.yaml"
I0803 03:14:09.507413 18315 initconfiguration.go:103] detected and using CRI socket: /var/run/dockershim.sock
I0803 03:14:09.507670 18315 interface.go:400] Looking for default routes with IPv4 addresses
I0803 03:14:09.507682 18315 interface.go:405] Default route transits interface "eth0"
I0803 03:14:09.507818 18315 interface.go:208] Interface eth0 is up
I0803 03:14:09.507873 18315 interface.go:256] Interface "eth0" has 3 addresses :[192.168.1.96/24 3.112.22.175/32 fe80::407:3eff:fe60:b0d3/64].
I0803 03:14:09.507889 18315 interface.go:223] Checking addr 192.168.1.96/24.
I0803 03:14:09.507897 18315 interface.go:230] IP found 192.168.1.96
I0803 03:14:09.507904 18315 interface.go:262] Found valid IPv4 address 192.168.1.96 for interface "eth0".
I0803 03:14:09.507909 18315 interface.go:411] Found active IP 192.168.1.96
I0803 03:14:09.507944 18315 version.go:183] fetching Kubernetes version from URL: https://dl.k8s.io/release/stable-1.txt
I0803 03:14:10.040089 18315 version.go:252] remote version is much newer: v1.21.3; falling back to: stable-1.18
I0803 03:14:10.040132 18315 version.go:183] fetching Kubernetes version from URL: https://dl.k8s.io/release/stable-1.18.txt
W0803 03:14:10.385524 18315 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
I0803 03:14:10.385774 18315 checks.go:577] validating Kubernetes and kubeadm version
I0803 03:14:10.385804 18315 checks.go:166] validating if the firewall is enabled and active
I0803 03:14:10.395183 18315 checks.go:201] validating availability of port 6443
I0803 03:14:10.395348 18315 checks.go:201] validating availability of port 10259
I0803 03:14:10.395386 18315 checks.go:201] validating availability of port 10257
I0803 03:14:10.395412 18315 checks.go:286] validating the existence of file /etc/kubernetes/manifests/kube-apiserver.yaml
I0803 03:14:10.395438 18315 checks.go:286] validating the existence of file /etc/kubernetes/manifests/kube-controller-manager.yaml
I0803 03:14:10.395448 18315 checks.go:286] validating the existence of file /etc/kubernetes/manifests/kube-scheduler.yaml
I0803 03:14:10.395456 18315 checks.go:286] validating the existence of file /etc/kubernetes/manifests/etcd.yaml
I0803 03:14:10.395466 18315 checks.go:432] validating if the connectivity type is via proxy or direct
I0803 03:14:10.395488 18315 checks.go:471] validating http connectivity to first IP address in the CIDR
I0803 03:14:10.395505 18315 checks.go:471] validating http connectivity to first IP address in the CIDR
I0803 03:14:10.395525 18315 checks.go:102] validating the container runtime
I0803 03:14:10.481009 18315 checks.go:128] validating if the service is enabled and active
I0803 03:14:10.583612 18315 checks.go:335] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables
I0803 03:14:10.583674 18315 checks.go:335] validating the contents of file /proc/sys/net/ipv4/ip_forward
I0803 03:14:10.583696 18315 checks.go:649] validating whether swap is enabled or not
I0803 03:14:10.583723 18315 checks.go:376] validating the presence of executable conntrack
I0803 03:14:10.583755 18315 checks.go:376] validating the presence of executable ip
I0803 03:14:10.583774 18315 checks.go:376] validating the presence of executable iptables
I0803 03:14:10.583791 18315 checks.go:376] validating the presence of executable mount
I0803 03:14:10.583809 18315 checks.go:376] validating the presence of executable nsenter
I0803 03:14:10.583853 18315 checks.go:376] validating the presence of executable ebtables
I0803 03:14:10.583877 18315 checks.go:376] validating the presence of executable ethtool
I0803 03:14:10.583914 18315 checks.go:376] validating the presence of executable socat
I0803 03:14:10.583936 18315 checks.go:376] validating the presence of executable tc
I0803 03:14:10.583962 18315 checks.go:376] validating the presence of executable touch
I0803 03:14:10.583985 18315 checks.go:520] running all checks
I0803 03:14:10.679622 18315 checks.go:406] checking whether the given node name is reachable using net.LookupHost
I0803 03:14:10.680655 18315 checks.go:618] validating kubelet version
I0803 03:14:10.741010 18315 checks.go:128] validating if the service is enabled and active
I0803 03:14:10.752762 18315 checks.go:201] validating availability of port 10250
I0803 03:14:10.752842 18315 checks.go:201] validating availability of port 2379
I0803 03:14:10.752869 18315 checks.go:201] validating availability of port 2380
I0803 03:14:10.752896 18315 checks.go:249] validating the existence and emptiness of directory /var/lib/etcd
I0803 03:14:10.810118 18315 checks.go:844] pulling k8s.gcr.io/kube-apiserver:v1.18.20
I0803 03:14:14.627388 18315 checks.go:844] pulling k8s.gcr.io/kube-controller-manager:v1.18.20
I0803 03:14:17.470977 18315 checks.go:844] pulling k8s.gcr.io/kube-scheduler:v1.18.20
I0803 03:14:19.677735 18315 checks.go:844] pulling k8s.gcr.io/kube-proxy:v1.18.20
I0803 03:14:25.446222 18315 checks.go:844] pulling k8s.gcr.io/pause:3.2
I0803 03:14:26.872368 18315 checks.go:844] pulling k8s.gcr.io/etcd:3.4.3-0
I0803 03:14:32.785877 18315 checks.go:844] pulling k8s.gcr.io/coredns:1.6.7
I0803 03:14:34.927136 18315 kubelet.go:64] Stopping the kubelet
I0803 03:14:35.188472 18315 certs.go:103] creating a new certificate authority for ca
I0803 03:14:35.875890 18315 certs.go:103] creating a new certificate authority for front-proxy-ca
I0803 03:14:36.257238 18315 certs.go:103] creating a new certificate authority for etcd-ca
I0803 03:14:37.406200 18315 certs.go:69] creating new public/private key files for signing service account users
I0803 03:14:37.495956 18315 kubeconfig.go:79] creating kubeconfig file for admin.conf
I0803 03:14:37.573174 18315 kubeconfig.go:79] creating kubeconfig file for kubelet.conf
I0803 03:14:38.107589 18315 kubeconfig.go:79] creating kubeconfig file for controller-manager.conf
I0803 03:14:38.275982 18315 kubeconfig.go:79] creating kubeconfig file for scheduler.conf
I0803 03:14:38.441379 18315 manifests.go:91] [control-plane] getting StaticPodSpecs
W0803 03:14:38.441504 18315 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
I0803 03:14:38.442080 18315 manifests.go:104] [control-plane] adding volume "ca-certs" for component "kube-apiserver"
I0803 03:14:38.442098 18315 manifests.go:104] [control-plane] adding volume "etc-ca-certificates" for component "kube-apiserver"
I0803 03:14:38.442104 18315 manifests.go:104] [control-plane] adding volume "k8s-certs" for component "kube-apiserver"
I0803 03:14:38.442110 18315 manifests.go:104] [control-plane] adding volume "usr-local-share-ca-certificates" for component "kube-apiserver"
I0803 03:14:38.442116 18315 manifests.go:104] [control-plane] adding volume "usr-share-ca-certificates" for component "kube-apiserver"
I0803 03:14:38.449756 18315 manifests.go:121] [control-plane] wrote static Pod manifest for component "kube-apiserver" to "/etc/kubernetes/manifests/kube-apiserver.yaml"
I0803 03:14:38.449790 18315 manifests.go:91] [control-plane] getting StaticPodSpecs
W0803 03:14:38.449852 18315 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
I0803 03:14:38.450080 18315 manifests.go:104] [control-plane] adding volume "ca-certs" for component "kube-controller-manager"
I0803 03:14:38.450098 18315 manifests.go:104] [control-plane] adding volume "etc-ca-certificates" for component "kube-controller-manager"
I0803 03:14:38.450104 18315 manifests.go:104] [control-plane] adding volume "flexvolume-dir" for component "kube-controller-manager"
I0803 03:14:38.450110 18315 manifests.go:104] [control-plane] adding volume "k8s-certs" for component "kube-controller-manager"
I0803 03:14:38.450116 18315 manifests.go:104] [control-plane] adding volume "kubeconfig" for component "kube-controller-manager"
I0803 03:14:38.450121 18315 manifests.go:104] [control-plane] adding volume "usr-local-share-ca-certificates" for component "kube-controller-manager"
I0803 03:14:38.450127 18315 manifests.go:104] [control-plane] adding volume "usr-share-ca-certificates" for component "kube-controller-manager"
I0803 03:14:38.450930 18315 manifests.go:121] [control-plane] wrote static Pod manifest for component "kube-controller-manager" to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
I0803 03:14:38.450955 18315 manifests.go:91] [control-plane] getting StaticPodSpecs
W0803 03:14:38.451008 18315 manifests.go:225] the default kube-apiserver authorization-mode is "Node,RBAC"; using "Node,RBAC"
I0803 03:14:38.451212 18315 manifests.go:104] [control-plane] adding volume "kubeconfig" for component "kube-scheduler"
I0803 03:14:38.451714 18315 manifests.go:121] [control-plane] wrote static Pod manifest for component "kube-scheduler" to "/etc/kubernetes/manifests/kube-scheduler.yaml"
I0803 03:14:38.452381 18315 local.go:72] [etcd] wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/manifests/etcd.yaml"
I0803 03:14:38.452403 18315 waitcontrolplane.go:87] [wait-control-plane] Waiting for the API server to be healthy
I0803 03:14:38.454495 18315 request.go:907] Got a Retry-After 1s response for attempt 1 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:39.455323 18315 request.go:907] Got a Retry-After 1s response for attempt 2 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:40.456097 18315 request.go:907] Got a Retry-After 1s response for attempt 3 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:41.456915 18315 request.go:907] Got a Retry-After 1s response for attempt 4 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:42.457775 18315 request.go:907] Got a Retry-After 1s response for attempt 5 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:43.458570 18315 request.go:907] Got a Retry-After 1s response for attempt 6 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:44.459356 18315 request.go:907] Got a Retry-After 1s response for attempt 7 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:45.460174 18315 request.go:907] Got a Retry-After 1s response for attempt 8 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:46.460923 18315 request.go:907] Got a Retry-After 1s response for attempt 9 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:47.962491 18315 request.go:907] Got a Retry-After 1s response for attempt 1 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:48.963271 18315 request.go:907] Got a Retry-After 1s response for attempt 2 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:49.963983 18315 request.go:907] Got a Retry-After 1s response for attempt 3 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:50.964864 18315 request.go:907] Got a Retry-After 1s response for attempt 4 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:51.965679 18315 request.go:907] Got a Retry-After 1s response for attempt 5 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:52.966604 18315 request.go:907] Got a Retry-After 1s response for attempt 6 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:53.967495 18315 request.go:907] Got a Retry-After 1s response for attempt 7 to https://3.112.22.175:9998/healthz?timeout=10s
I0803 03:14:56.964028 18315 uploadconfig.go:108] [upload-config] Uploading the kubeadm ClusterConfiguration to a ConfigMap
I0803 03:14:56.977918 18315 uploadconfig.go:122] [upload-config] Uploading the kubelet component config to a ConfigMap
I0803 03:14:56.987080 18315 uploadconfig.go:127] [upload-config] Preserving the CRISocket information for the control-plane node
I0803 03:14:56.987116 18315 patchnode.go:30] [patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "ip-192-168-1-96" as an annotation
I0803 03:14:58.042891 18315 clusterinfo.go:45] [bootstrap-token] loading admin kubeconfig
I0803 03:14:58.043370 18315 clusterinfo.go:53] [bootstrap-token] copying the cluster from admin.conf to the bootstrap kubeconfig
I0803 03:14:58.043640 18315 clusterinfo.go:65] [bootstrap-token] creating/updating ConfigMap in kube-public namespace
I0803 03:14:58.046163 18315 clusterinfo.go:79] creating the RBAC rules for exposing the cluster-info ConfigMap in the kube-public namespace
I0803 03:14:58.051133 18315 kubeletfinalize.go:88] [kubelet-finalize] Assuming that kubelet client certificate rotation is enabled: found "/var/lib/kubelet/pki/kubelet-client-current.pem"
I0803 03:14:58.051970 18315 kubeletfinalize.go:132] [kubelet-finalize] Restarting the kubelet to enable client certificate rotation
I0803 03:14:58.487468 18315 request.go:557] Throttling request took 184.695441ms, request: POST:https://3.112.22.175:9998/api/v1/namespaces/kube-system/serviceaccounts?timeout=10s
I0803 03:14:58.687551 18315 request.go:557] Throttling request took 167.608572ms, request: POST:https://3.112.22.175:9998/api/v1/namespaces/kube-system/services?timeout=10s
I0803 03:14:58.887549 18315 request.go:557] Throttling request took 190.130979ms, request: POST:https://3.112.22.175:9998/api/v1/namespaces/kube-system/serviceaccounts?timeout=10s
I0803 03:14:59.087544 18315 request.go:557] Throttling request took 194.755559ms, request: POST:https://3.112.22.175:9998/api/v1/namespaces/kube-system/configmaps?timeout=10s
time="2021-08-03T03:15:00Z" level=info msg="install networkCNI"
time="2021-08-03T03:15:02Z" level=info msg="end k8s init"
time="2021-08-03T03:15:02Z" level=info msg="start k8s join"
time="2021-08-03T03:15:02Z" level=info msg="worker join (vm=sjh1-w-1-4k340)"
W0803 03:15:03.904790 17758 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
time="2021-08-03T03:15:23Z" level=info msg="end k8s join"
time="2021-08-03T03:15:23Z" level=info msg="duration := 11m48.482540473s"
[끝]
[pod 현황 (kubectl get pod -A)] 제가 어제 CB를 실행했을 때에는 대부분의 파드가 Running 상태였는데 오늘 확인해 보니 대부분의 파드가 Pending 으로 바뀌어 있네요..
❯ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
cloud-barista cb-dragonfly-58f48dc4-d66tx 0/1 Pending 0 37h
cloud-barista cb-dragonfly-influxdb-0 0/1 Pending 0 35h
cloud-barista cb-dragonfly-kafka-0 0/1 Pending 0 35h
cloud-barista cb-dragonfly-zookeeper-0 0/1 Pending 0 35h
cloud-barista cb-ladybug-94474d4bf-nkjnl 0/1 Pending 0 37h
cloud-barista cb-restapigw-575cb4d7cb-t7jll 0/1 Pending 0 37h
cloud-barista cb-restapigw-influxdb-6d49bccc4d-jfsxn 0/1 Pending 0 37h
cloud-barista cb-restapigw-jaeger-agent-m8wz4 1/1 Running 17 42h
cloud-barista cb-restapigw-jaeger-collector-5f4456448f-wz8vk 0/1 Pending 0 37h
cloud-barista cb-restapigw-jaeger-query-d7844c565-6dw2t 0/2 Pending 0 37h
cloud-barista cb-spider-6457455678-kmdpw 0/1 Pending 0 37h
cloud-barista cb-tumblebug-5dd7d5756b-vz4fl 0/1 Pending 0 37h
cloud-barista cb-webtool-676847cfb7-9djfk 0/1 Pending 0 37h
cloud-barista cloud-barista-cassandra-0 0/1 Pending 0 35h
cloud-barista cloud-barista-dragonfly-kapacitor-59758758fc-8xbrw 0/1 Pending 0 37h
cloud-barista cloud-barista-etcd-0 0/1 Pending 0 35h
cloud-barista cloud-barista-grafana-6954497b57-hm4tt 0/2 Pending 0 37h
cloud-barista cloud-barista-kube-state-metrics-78558787c4-qkmc4 0/1 Pending 0 37h
cloud-barista cloud-barista-prometheus-alertmanager-55487695d5-slz4x 0/2 Pending 0 37h
cloud-barista cloud-barista-prometheus-node-exporter-cpk64 1/1 Running 0 42h
cloud-barista cloud-barista-prometheus-pushgateway-7d965ccb7-59gzh 0/1 Pending 0 37h
cloud-barista cloud-barista-prometheus-server-597455b56b-wxpqp 0/2 Pending 0 37h
kube-system coredns-66bff467f8-mk7lx 1/1 Running 0 46h
kube-system coredns-66bff467f8-nmvnt 1/1 Running 0 43h
kube-system etcd-ip-192-168-1-96 1/1 Running 0 46h
kube-system kilo-nkgqh 1/1 Running 0 46h
kube-system kilo-trfgl 1/1 Running 0 46h
kube-system kube-apiserver-ip-192-168-1-96 1/1 Running 0 46h
kube-system kube-controller-manager-ip-192-168-1-96 1/1 Running 1 46h
kube-system kube-flannel-ds-4s5s8 1/1 Running 0 46h
kube-system kube-flannel-ds-vwn6h 1/1 Running 0 46h
kube-system kube-proxy-tnxpx 1/1 Running 0 46h
kube-system kube-proxy-vpfqr 1/1 Running 0 46h
kube-system kube-scheduler-ip-192-168-1-96 1/1 Running 1 46h
kube-system metrics-server-77775f68b8-fvtjf 0/1 Pending 0 37h
CB를 다시 띄워 보거나, MCKS K8s cluster를 다시 만들어 보는 등의 작업을 해 보겠습니다. 😊
파드들이 왜 Pending 인지 살펴 보았습니다.
❯ kubectl logs cb-tumblebug-5dd7d5756b-vz4fl -n cloud-barista
[아무 것도 뜨지 않음]
❯ kubectl describe pod cb-tumblebug-5dd7d5756b-vz4fl -n cloud-barista
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 36h default-scheduler 0/2 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 1 node(s) had taint {node.kubernetes.io/unreachable: }, that the pod didn't tolerate.
컨트롤 플레인 노드에는 master
테인트가 있어서 스케줄되지 않았고
워커 노드에는 unreachable
이어서 스케줄되지 않았네요.. 🤔
그런데 또,
cb-restapigw-jaeger-agent-m8wz4
파드와
cloud-barista-prometheus-node-exporter-cpk64
파드는
워커 노드 (cb-sjh1-sjh1-w-1-4k340
) 에서 Running 상태 이구요.. 🤔
./operator remove
명령을 실행하여 Cloud-Barista Helm release를 uninstall하니
MCKS K8s 클러스터가 사용 중에 고장난 것 같은 느낌이 듭니다. 🤔
MCKS K8s 클러스터를 삭제하고 다시 만들어 보겠습니다.
MCKS K8s 클러스터를 삭제하고 다시 만드니 Ready 로 뜹니다.
❯ kubectl get nodes
NAME STATUS ROLES AGE VERSION
cb-sjh1-sjh1-w-1-rw7mc Ready <none> 42s v1.18.9
ip-192-168-1-149 Ready master 65s v1.18.9
❯ kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
cloud-barista cb-dragonfly-58f48dc4-6nzfx 1/1 Running 0 10m
cloud-barista cb-dragonfly-influxdb-0 1/1 Running 0 10m
cloud-barista cb-dragonfly-kafka-0 1/1 Running 1 10m
cloud-barista cb-dragonfly-zookeeper-0 1/1 Running 0 10m
cloud-barista cb-ladybug-94474d4bf-b4rqh 1/1 Running 0 10m
cloud-barista cb-restapigw-575cb4d7cb-wwc6v 1/1 Running 0 10m
cloud-barista cb-restapigw-influxdb-6d49bccc4d-dd7d6 1/1 Running 0 10m
cloud-barista cb-restapigw-jaeger-agent-c9ztt 1/1 Running 0 10m
cloud-barista cb-restapigw-jaeger-cassandra-schema-kk8tx 0/1 Completed 0 10m
cloud-barista cb-restapigw-jaeger-collector-5f4456448f-k6pv2 0/1 CrashLoopBackOff 4 10m
cloud-barista cb-restapigw-jaeger-query-d7844c565-vcwqg 2/2 Running 5 10m
cloud-barista cb-spider-6457455678-bg2dq 1/1 Running 0 10m
cloud-barista cb-tumblebug-5dd7d5756b-s959r 1/1 Running 0 10m
cloud-barista cb-webtool-676847cfb7-9k7kf 1/1 Running 0 10m
cloud-barista cloud-barista-cassandra-0 0/1 Running 3 10m
cloud-barista cloud-barista-cassandra-1 0/1 CrashLoopBackOff 3 7m40s
cloud-barista cloud-barista-cassandra-2 0/1 CrashLoopBackOff 3 6m3s
cloud-barista cloud-barista-dragonfly-kapacitor-59758758fc-6j8z5 1/1 Running 0 10m
cloud-barista cloud-barista-etcd-0 1/1 Running 0 10m
cloud-barista cloud-barista-grafana-6954497b57-rk927 2/2 Running 0 10m
cloud-barista cloud-barista-kube-state-metrics-78558787c4-hppdm 1/1 Running 0 10m
cloud-barista cloud-barista-prometheus-alertmanager-55487695d5-mlxrb 2/2 Running 0 10m
cloud-barista cloud-barista-prometheus-node-exporter-gznm6 1/1 Running 0 10m
cloud-barista cloud-barista-prometheus-pushgateway-7d965ccb7-4wwp9 1/1 Running 0 10m
cloud-barista cloud-barista-prometheus-server-597455b56b-7hzmb 2/2 Running 0 10m
kube-system coredns-66bff467f8-bzwdl 1/1 Running 0 17m
kube-system coredns-66bff467f8-vv599 1/1 Running 0 17m
kube-system etcd-ip-192-168-1-149 1/1 Running 0 17m
kube-system kilo-hbtk4 1/1 Running 0 17m
kube-system kilo-rjlms 1/1 Running 0 17m
kube-system kube-apiserver-ip-192-168-1-149 1/1 Running 0 17m
kube-system kube-controller-manager-ip-192-168-1-149 1/1 Running 0 17m
kube-system kube-flannel-ds-h26h7 1/1 Running 1 17m
kube-system kube-flannel-ds-tkf54 1/1 Running 0 17m
kube-system kube-proxy-67qrq 1/1 Running 0 17m
kube-system kube-proxy-wc2hl 1/1 Running 0 17m
kube-system kube-scheduler-ip-192-168-1-149 1/1 Running 0 17m
kube-system metrics-server-77775f68b8-dhqbg 1/1 Running 0 10m
사용해 보다가, 다시 NotReady 로 바뀌면 댓글을 달도록 하겠습니다~~
지금 확인해 보니, 또 NotReady 로 바뀌어 있네요..
❯ kubectl get nodes
NAME STATUS ROLES AGE VERSION
cb-sjh1-sjh1-w-1-rw7mc NotReady <none> 16h v1.18.9
ip-192-168-1-149 Ready master 16h v1.18.9
❯ kubectl get pods -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
cloud-barista cb-dragonfly-58f48dc4-6nzfx 1/1 Terminating 0 16h 10.244.1.5 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
cloud-barista cb-dragonfly-58f48dc4-dv7z5 0/1 Pending 0 8h <none> <none> <none> <none>
cloud-barista cb-dragonfly-influxdb-0 1/1 Terminating 0 16h 10.244.1.17 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
cloud-barista cb-dragonfly-kafka-0 1/1 Terminating 1 16h 10.244.1.16 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
cloud-barista cb-dragonfly-zookeeper-0 1/1 Terminating 0 16h 10.244.1.15 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
cloud-barista cb-ladybug-94474d4bf-b4rqh 1/1 Terminating 0 16h 10.244.1.6 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
cloud-barista cb-ladybug-94474d4bf-nw8np 0/1 Pending 0 8h <none> <none> <none> <none>
cloud-barista cb-restapigw-575cb4d7cb-rxpjq 0/1 Pending 0 8h <none> <none> <none> <none>
cloud-barista cb-restapigw-575cb4d7cb-wwc6v 1/1 Terminating 0 16h 10.244.1.10 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
cloud-barista cb-restapigw-influxdb-6d49bccc4d-dd7d6 1/1 Terminating 0 16h 10.244.1.11 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
cloud-barista cb-restapigw-influxdb-6d49bccc4d-mbdnh 0/1 Pending 0 8h <none> <none> <none> <none>
cloud-barista cb-restapigw-jaeger-agent-c9ztt 1/1 Running 14 16h 10.244.1.3 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
cloud-barista cb-restapigw-jaeger-collector-5f4456448f-k6pv2 1/1 Terminating 57 16h 10.244.1.23 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
cloud-barista cb-restapigw-jaeger-collector-5f4456448f-vjt8n 0/1 Pending 0 8h <none> <none> <none> <none>
cloud-barista cb-restapigw-jaeger-query-d7844c565-nc8cj 0/2 Pending 0 8h <none> <none> <none> <none>
cloud-barista cb-restapigw-jaeger-query-d7844c565-vcwqg 2/2 Terminating 76 16h 10.244.1.4 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
cloud-barista cb-spider-6457455678-bg2dq 1/1 Terminating 0 16h 10.244.1.24 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
cloud-barista cb-spider-6457455678-l2mnn 0/1 Pending 0 8h <none> <none> <none> <none>
cloud-barista cb-tumblebug-5dd7d5756b-89b4b 0/1 Pending 0 8h <none> <none> <none> <none>
cloud-barista cb-tumblebug-5dd7d5756b-s959r 1/1 Terminating 0 16h 10.244.1.22 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
cloud-barista cb-webtool-676847cfb7-9k7kf 1/1 Terminating 0 16h 10.244.1.8 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
cloud-barista cb-webtool-676847cfb7-jzc7b 0/1 Pending 0 8h <none> <none> <none> <none>
cloud-barista cloud-barista-cassandra-0 1/1 Terminating 65 16h 10.244.1.12 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
cloud-barista cloud-barista-cassandra-1 0/1 Terminating 97 16h 10.244.1.25 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
cloud-barista cloud-barista-cassandra-2 1/1 Terminating 94 16h 10.244.1.26 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
cloud-barista cloud-barista-dragonfly-kapacitor-59758758fc-6j8z5 1/1 Terminating 0 16h 10.244.1.13 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
cloud-barista cloud-barista-dragonfly-kapacitor-59758758fc-xrfhk 0/1 Pending 0 8h <none> <none> <none> <none>
cloud-barista cloud-barista-etcd-0 1/1 Terminating 0 16h 10.244.1.14 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
cloud-barista cloud-barista-grafana-6954497b57-rk927 2/2 Terminating 0 16h 10.244.1.19 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
cloud-barista cloud-barista-grafana-6954497b57-tlnz5 0/2 Pending 0 8h <none> <none> <none> <none>
cloud-barista cloud-barista-kube-state-metrics-78558787c4-5k76x 0/1 Pending 0 8h <none> <none> <none> <none>
cloud-barista cloud-barista-kube-state-metrics-78558787c4-hppdm 1/1 Terminating 0 16h 10.244.1.21 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
cloud-barista cloud-barista-prometheus-alertmanager-55487695d5-czzzt 0/2 Pending 0 8h <none> <none> <none> <none>
cloud-barista cloud-barista-prometheus-alertmanager-55487695d5-mlxrb 2/2 Terminating 0 16h 10.244.1.18 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
cloud-barista cloud-barista-prometheus-node-exporter-gznm6 1/1 Running 0 16h 34.64.147.45 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
cloud-barista cloud-barista-prometheus-pushgateway-7d965ccb7-4wwp9 1/1 Terminating 0 16h 10.244.1.9 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
cloud-barista cloud-barista-prometheus-pushgateway-7d965ccb7-vgwm8 0/1 Pending 0 8h <none> <none> <none> <none>
cloud-barista cloud-barista-prometheus-server-597455b56b-7hzmb 2/2 Terminating 0 16h 10.244.1.7 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
cloud-barista cloud-barista-prometheus-server-597455b56b-rqt78 0/2 Pending 0 8h <none> <none> <none> <none>
kube-system coredns-66bff467f8-bzwdl 1/1 Running 0 16h 10.244.0.2 ip-192-168-1-149 <none> <none>
kube-system coredns-66bff467f8-vv599 1/1 Running 0 16h 10.244.0.3 ip-192-168-1-149 <none> <none>
kube-system etcd-ip-192-168-1-149 1/1 Running 0 16h 54.168.70.136 ip-192-168-1-149 <none> <none>
kube-system kilo-hbtk4 1/1 Running 0 16h 54.168.70.136 ip-192-168-1-149 <none> <none>
kube-system kilo-rjlms 1/1 Running 0 16h 34.64.147.45 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
kube-system kube-apiserver-ip-192-168-1-149 1/1 Running 0 16h 54.168.70.136 ip-192-168-1-149 <none> <none>
kube-system kube-controller-manager-ip-192-168-1-149 1/1 Running 0 16h 54.168.70.136 ip-192-168-1-149 <none> <none>
kube-system kube-flannel-ds-h26h7 1/1 Running 1 16h 34.64.147.45 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
kube-system kube-flannel-ds-tkf54 1/1 Running 0 16h 54.168.70.136 ip-192-168-1-149 <none> <none>
kube-system kube-proxy-67qrq 1/1 Running 0 16h 34.64.147.45 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
kube-system kube-proxy-wc2hl 1/1 Running 0 16h 54.168.70.136 ip-192-168-1-149 <none> <none>
kube-system kube-scheduler-ip-192-168-1-149 1/1 Running 0 16h 54.168.70.136 ip-192-168-1-149 <none> <none>
kube-system metrics-server-77775f68b8-dhqbg 1/1 Terminating 0 16h 10.244.1.2 cb-sjh1-sjh1-w-1-rw7mc <none> <none>
kube-system metrics-server-77775f68b8-f2549 0/1 Pending 0 8h <none> <none> <none> <none>
[Terminating 상태인 파드의 로그 조회] (잘 동작하는 것을 확인했던 파드였습니다.)
❯ kubectl logs cb-spider-6457455678-bg2dq -n cloud-barista
Error from server: Get https://34.64.147.45:10250/containerLogs/cloud-barista/cb-spider-6457455678-bg2dq/cb-spider: dial tcp 34.64.147.45:10250: i/o timeout
[Pending 상태인 파드의 로그 조회]
❯ kubectl logs cb-spider-6457455678-l2mnn -n cloud-barista
[아무 것도 뜨지 않음]
[Terminating 상태인 파드의 describe 조회]
❯ kubectl describe pod cb-spider-6457455678-bg2dq -n cloud-barista
State: Terminated
Reason: Completed
Exit Code: 0
Started: Thu, 05 Aug 2021 17:57:11 +0900
Finished: Thu, 05 Aug 2021 17:57:11 +0900
Events: <none>
[Pending 상태인 파드의 describe 조회]
❯ kubectl describe pod cb-spider-6457455678-l2mnn -n cloud-barista
Status: Pending
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 4m36s (x339 over 8h) default-scheduler 0/2 nodes are available: 1 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate, 1 node(s) had taint {node.kubernetes.io/unreachable: }, that the pod didn't tolerate.
왜 이런걸까요.. 🤔
워커 노드의 boot disk size가 기본적으로는 10GB (GCP) 인데,
이를 100GB로 늘리니 8일 이후에도 워커 노드가 Ready
상태를 유지합니다.
Related issue:
What happened : Worker nodes' status remains NotReady.
What you expected to happen :
How to reproduce it (as minimally and precisely as possible) :
Anything else we need to know? :
Environment
Proposed solution :
Any other context :