CentaurusInfra / arktos

Arktos for large-scale cloud platform
Apache License 2.0
245 stars 69 forks source link

[kube-up][scale-out] kubernetes-dashboard pod stuck in CrashLoopBackOff state #1302

Open h-w-chen opened 2 years ago

h-w-chen commented 2 years ago

What happened: after 1x1 kube-up scale-out cluster starts, found kubernetes-dashboard is stuck in CrashLoopBackOff state:

$ ./cluster/kubectl.sh --kubeconfig=cluster/kubeconfig.tp-1 get pod -n kube-system -l k8s-app=kubernetes-dashboard
NAME                                   HASHKEY               READY   STATUS             RESTARTS   AGE
kubernetes-dashboard-848965699-288nn   2434574577953097305   0/1     CrashLoopBackOff   8          25m

What you expected to happen: pod in Running state

How to reproduce it (as minimally and precisely as possible):

  1. starts 1x1 kube-up scale-out cluster;
  2. on TP master, run kubectl get pod -n kube-system -l k8s-app=kubernetes-dashboard

Anything else we need to know?:

Environment:

h-w-chen commented 2 years ago

below is the pod log:

2022-01-24T22:52:33.527601719Z stderr F 2022/01/24 22:52:33 Starting overwatch
2022-01-24T22:52:33.527705698Z stdout F 2022/01/24 22:52:33 Using in-cluster config to connect to apiserver
2022-01-24T22:52:33.528021655Z stdout F 2022/01/24 22:52:33 Using service account token for csrf signing
2022-01-24T22:52:33.628043796Z stdout F 2022/01/24 22:52:33 Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service ac
count's configuration) or the --apiserver-host param points to a server that does not exist. Reason: the server has asked for the client to provide credentials
2022-01-24T22:52:33.628086328Z stdout F Refer to our FAQ and wiki pages for more information: https://github.com/kubernetes/dashboard/wiki/FAQ
Sindica commented 2 years ago

Check whether this is service related

h-w-chen commented 2 years ago

latest (crashing)pod log is

{"log":"2022/02/25 00:52:38 Starting overwatch\n","stream":"stderr","time":"2022-02-25T00:52:38.173140985Z"}
{"log":"2022/02/25 00:52:38 Using in-cluster config to connect to apiserver\n","stream":"stdout","time":"2022-02-25T00:52:38.17335097Z"}
{"log":"2022/02/25 00:52:38 Using service account token for csrf signing\n","stream":"stdout","time":"2022-02-25T00:52:38.174494935Z"}
{"log":"2022/02/25 00:52:41 Successful initial request to the apiserver, version: v0.9.0\n","stream":"stdout","time":"2022-02-25T00:52:41.419075285Z"}
{"log":"2022/02/25 00:52:41 Generating JWE encryption key\n","stream":"stdout","time":"2022-02-25T00:52:41.419115339Z"}
{"log":"2022/02/25 00:52:41 New synchronizer has been registered: kubernetes-dashboard-key-holder-kube-system. Starting\n","stream":"stdout","time":"2022-02-25T00:52:41.419278836Z"}
{"log":"2022/02/25 00:52:41 Starting secret synchronizer for kubernetes-dashboard-key-holder in namespace kube-system\n","stream":"stdout","time":"2022-02-25T00:52:41.419292974Z"}
{"log":"2022/02/25 00:52:45 Initializing JWE encryption key from synchronized object\n","stream":"stdout","time":"2022-02-25T00:52:45.576608983Z"}
Sindica commented 2 years ago

I got the following error:

{"log":"2022/02/25 04:26:46 Using in-cluster config to connect to apiserver\n","stream":"stdout","time":"2022-02-25T04:26:46.464932633Z"}
{"log":"2022/02/25 04:26:46 Using service account token for csrf signing\n","stream":"stdout","time":"2022-02-25T04:26:46.4663454Z"}
{"log":"2022/02/25 04:27:16 Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service account's configuration) or the --apiserver-host param points to a server that does not exist. Reason: Get https://10.0.0.1:443/version: dial tcp 10.0.0.1:443: i/o timeout\n","stream":"stdout","time":"2022-02-25T04:27:16.468615797Z"}
{"log":"Refer to our FAQ and wiki pages for more information: https://github.com/kubernetes/dashboard/wiki/FAQ\n","stream":"stdout","time":"2022-02-25T04:27:16.468661528Z"}