Closed nickatnceas closed 1 year ago
I ran the following to renew the certs and get the metadig
account reconnected to k8s-dev:
root@docker-dev-ucsb-1:~# sudo kubeadm certs renew all
[renew] Reading configuration from the cluster...
[renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[renew] Error reading configuration from the Cluster. Falling back to default configuration
certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed
Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates.
root@docker-dev-ucsb-1:~# sudo kubeadm certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[check-expiration] Error reading configuration from the Cluster. Falling back to default configuration
CERTIFICATE EXPIRES RESIDUAL TIME CERTIFICATE AUTHORITY EXTERNALLY MANAGED
admin.conf Aug 25, 2023 20:45 UTC 364d ca no
apiserver Aug 25, 2023 20:45 UTC 364d ca no
apiserver-etcd-client Aug 25, 2023 20:45 UTC 364d etcd-ca no
apiserver-kubelet-client Aug 25, 2023 20:45 UTC 364d ca no
controller-manager.conf Aug 25, 2023 20:45 UTC 364d ca no
etcd-healthcheck-client Aug 25, 2023 20:45 UTC 364d etcd-ca no
etcd-peer Aug 25, 2023 20:45 UTC 364d etcd-ca no
etcd-server Aug 25, 2023 20:45 UTC 364d etcd-ca no
front-proxy-client Aug 25, 2023 20:45 UTC 364d front-proxy-ca no
scheduler.conf Aug 25, 2023 20:45 UTC 364d ca no
CERTIFICATE AUTHORITY EXPIRES RESIDUAL TIME EXTERNALLY MANAGED
ca Jan 28, 2030 19:14 UTC 7y no
etcd-ca Jan 28, 2030 19:14 UTC 7y no
front-proxy-ca Jan 28, 2030 19:14 UTC 7y no
cp /etc/kubernetes/admin.conf /home/metadig/.kube/config
chown metadig:metadig /home/metadig/.kube/config
metadig@docker-dev-ucsb-1:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
docker-dev-ucsb-1 Ready control-plane,master 2y207d v1.23.3
docker-dev-ucsb-2 Ready <none> 2y207d v1.23.3
I restarted kube-apiserver, kube-controller-manager, kube-scheduler and etcd,
, per the instructions given in the output of the cert renew command above:
metadig@docker-dev-ucsb-1:~$ kubectl delete pod/kube-scheduler-docker-dev-ucsb-1 -n kube-system
pod "kube-scheduler-docker-dev-ucsb-1" deleted
metadig@docker-dev-ucsb-1:~$ kubectl delete pod/kube-apiserver-docker-dev-ucsb-1 -n kube-system
pod "kube-apiserver-docker-dev-ucsb-1" deleted
metadig@docker-dev-ucsb-1:~$ kubectl delete pod/kube-controller-manager-docker-dev-ucsb-1 -n kube-system
pod "kube-controller-manager-docker-dev-ucsb-1" deleted
metadig@docker-dev-ucsb-1:~$ kubectl delete pod/etcd-docker-dev-ucsb-1 -n kube-system
pod "etcd-docker-dev-ucsb-1" deleted
metadig@docker-dev-ucsb-1:~$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
calico-kube-controllers-6fd7b9848d-7wrr9 1/1 Running 1792 (17m ago) 165d
calico-node-78ttp 1/1 Running 1 (98d ago) 165d
calico-node-hzznf 1/1 Running 31 (96d ago) 165d
coredns-78fcd69978-qktrm 1/1 Running 4 (96d ago) 373d
coredns-78fcd69978-xjqjz 1/1 Running 4 (96d ago) 373d
etcd-docker-dev-ucsb-1 1/1 Running 51 (96d ago) 114s
kube-apiserver-docker-dev-ucsb-1 1/1 Running 636 2m48s
kube-controller-manager-docker-dev-ucsb-1 1/1 Running 2205 (8d ago) 2m20s
kube-proxy-7hmdm 1/1 Running 2 (98d ago) 373d
kube-proxy-x54mw 1/1 Running 4 (96d ago) 373d
kube-scheduler-docker-dev-ucsb-1 1/1 Running 531 (8d ago) 4m31s
Updating the existing *.config files in the metadig homedir with the new certificate-authority-data
key from the /etc/kubernetes/admin.conf
file appears to allow access just to the listed namespaces:
metadig@docker-dev-ucsb-1:~/.kube$ KUBECONFIG=/home/metadig/.kube/polder.config
metadig@docker-dev-ucsb-1:~/.kube$ kubectl get pods --all-namespaces
Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:polder:polder" cannot list resource "pods" in API group "" at the cluster scope
metadig@docker-dev-ucsb-1:~/.kube$ kubectl get pods -n polder
NAME READY STATUS RESTARTS AGE
crawl-27658080--1-z4bq7 0/1 Completed 0 22d
crawl-27668160--1-dr7d6 0/1 Completed 0 15d
crawl-27678240--1-t7dnn 0/1 Completed 0 8d
dev-gleaner-8b6b6c4c9-ghhln 3/3 Running 0 31d
dev-polder-78bccfcd46-h9rnb 1/1 Running 46 (3d21h ago) 27d
setup-gleaner--1-9ffw8 0/1 Completed 0 27d
I gpg encrypted and emailed the polder.config file to Melinda, and after copying the certificate-authority-data:
line to another polder.config file (with user dev-polder instead of polder) she reported that she can connect to the k8s-dev cluster again.
ok, so I looked at the credentials in config-dev and compared them to the ones in root@k8s-dev-ucsb-1:/etc/kubernetes/admin.conf
, and the client-certificate-data
and client-key-data
for the kubernetes-admin user did not match. I updated config-dev
with the new info from admin.conf
, and now everything works fine to log in to dev-k8s:
$ kubectl config use-context dev-k8s
Switched to context "dev-k8s".
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
docker-dev-ucsb-1 Ready control-plane,master 2y220d v1.23.3
docker-dev-ucsb-2 Ready <none> 2y220d v1.23.3
so, I updated the config-dev file in the security repo -- @nickatnceas @taojing2002 if you grab the new copy it should work for you now. let me know if not.
Matt reported issues with k8s-dev:
$ kubectl run -i -n jones --tty --rm debug --image=busybox --restart=Never -- sh
pod "debug" deleted
error: timed out waiting for the condition
And I had the same experience:
outin@halt-21280:~/.kube$ kubectl run -i -n nick --tty --rm debug --image=busybox --restart=Never -- sh
pod "debug" deleted
error: timed out waiting for the condition
After checking the logs I found recent errors related to the cert expiration in /var/log/containers/kube-apiserver-docker-dev-ucsb-1_kube-system_kube-apiserver-cabce005da75cb02ea886d5f351a79c9136c8c519097123d946165e2ef596d51.log
:
{"log":"E0922 17:54:18.402829 1 authentication.go:63] \"Unable to authenticate the request\" err=\"[x509: certificate has expired or is not yet valid: current time 2022-09-22T17:54:18Z is after 2022-08-17T17:05:39Z, verifying certificate SN=932365683341995477, SKID=, AKID= failed: x509: certificate has expired or is not yet valid: current time 2022-09-22T17:54:18Z is after 2022-08-17T17:05:39Z]\"\n","stream":"stderr","time":"2022-09-22T17:54:18.403481156Z"}
I restarted kube-apiserver-docker-dev-ucsb-1 again (same method as above) which did not help. I ran systemctl restart kubelet
, which caused more issues, such as api.test.dataone.org to go offline. I then rebooted k8s-dev-ctrl-1
, and when it came back up api.test.dataone.org worked again, and I was able to run the test pod:
outin@halt-21280:~/.kube$ kubectl run -i -n nick --tty --rm debug --image=busybox --restart=Never -- sh
If you don't see a command prompt, try pressing enter.
/ #
It appears that more than just the four services specified above need to be restarted after renewing certs, and rebooting the controller will take care of all required pod restarts.
Reported by Melinda in the DataONE#dev-general Slack, it appears that the client certificates for the k8s-dev cluster have expired:
There appear to be several ways to renew them, according to https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-certs/