ReSearchITEng / kubeadm-playbook

Fully fledged (HA) Kubernetes Cluster using official kubeadm, ansible and helm. Tested on RHEL/CentOS/Ubuntu with support of http_proxy, dashboard installed, ingress controller, heapster - using official helm charts
https://researchiteng.github.io/kubeadm-playbook/
The Unlicense
592 stars 102 forks source link

error: unable to load server certificate: open /etc/kubernetes/pki/apiserver.crt: permission denied #68

Closed venomwaqar closed 5 years ago

venomwaqar commented 5 years ago

Inventory File

[primary-master]
ec2-34-216-121-64.us-west-2.compute.amazonaws.com ansible_user=centos

[masters:children]
primary-master

[primary-etcd:children]
primary-master

[etcd:children]
primary-etcd

[nodes]
ec2-18-236-109-252.us-west-2.compute.amazonaws.com ansible_user=centos

[node:children]
nodes

Last Task

TASK [master : Initialize cluster with {{kubeadm_init_args}} --config /etc/kubernetes/kubeadm-master.conf] *************************************************************************************************
fatal: [ec2-34-216-121-64.us-west-2.compute.amazonaws.com]: FAILED! => {"changed": true, "cmd": ["/usr/bin/kubeadm", "init", "--config", "/etc/kubernetes/kubeadm-master.conf"], "delta": "0:04:04.388213", "end": "2019-05-13 16:40:29.313286", "msg": "non-zero return code", "rc": 1, "start": "2019-05-13 16:36:24.925073", "stderr": "\t[WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'\nerror execution phase wait-control-plane: couldn't initialize a Kubernetes cluster", "stderr_lines": ["\t[WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'", "error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster"], "stdout": "[init] Using Kubernetes version: v1.13.4\n[preflight] Running pre-flight checks\n[preflight] Pulling images required for setting up a Kubernetes cluster\n[preflight] This might take a minute or two, depending on the speed of your internet connection\n[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'\n[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"\n[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"\n[kubelet-start] Activating the kubelet service\n[certs] Using certificateDir folder \"/etc/kubernetes/pki\"\n[certs] Generating \"ca\" certificate and key\n[certs] Generating \"apiserver\" certificate and key\n[certs] apiserver serving cert is signed for DNS names [ip-172-31-52-113.us-west-2.compute.internal kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local ec2-34-216-121-64.us-west-2.compute.amazonaws.com ec2-34-216-121-64 master-demok8s.corp.example.com] and IPs [10.96.0.1 172.31.52.113 172.31.52.113 127.0.0.1 10.1.2.3]\n[certs] Generating \"apiserver-kubelet-client\" certificate and key\n[certs] Generating \"front-proxy-ca\" certificate and key\n[certs] Generating \"front-proxy-client\" certificate and key\n[certs] Generating \"etcd/ca\" certificate and key\n[certs] Generating \"etcd/peer\" certificate and key\n[certs] etcd/peer serving cert is signed for DNS names [ip-172-31-52-113.us-west-2.compute.internal localhost] and IPs [172.31.52.113 127.0.0.1 ::1]\n[certs] Generating \"etcd/healthcheck-client\" certificate and key\n[certs] Generating \"etcd/server\" certificate and key\n[certs] etcd/server serving cert is signed for DNS names [ip-172-31-52-113.us-west-2.compute.internal localhost] and IPs [172.31.52.113 127.0.0.1 ::1]\n[certs] Generating \"apiserver-etcd-client\" certificate and key\n[certs] Generating \"sa\" key and public key\n[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"\n[kubeconfig] Writing \"admin.conf\" kubeconfig file\n[kubeconfig] Writing \"kubelet.conf\" kubeconfig file\n[kubeconfig] Writing \"controller-manager.conf\" kubeconfig file\n[kubeconfig] Writing \"scheduler.conf\" kubeconfig file\n[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"\n[control-plane] Creating static Pod manifest for \"kube-apiserver\"\n[control-plane] Creating static Pod manifest for \"kube-controller-manager\"\n[control-plane] Creating static Pod manifest for \"kube-scheduler\"\n[etcd] Creating static Pod manifest for local etcd in \"/etc/kubernetes/manifests\"\n[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 4m0s\n[kubelet-check] Initial timeout of 40s passed.\n\nUnfortunately, an error has occurred:\n\ttimed out waiting for the condition\n\nThis error is likely caused by:\n\t- The kubelet is not running\n\t- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)\n\nIf you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:\n\t- 'systemctl status kubelet'\n\t- 'journalctl -xeu kubelet'\n\nAdditionally, a control plane component may have crashed or exited when started by the container runtime.\nTo troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.\nHere is one example how you may list all Kubernetes containers running in docker:\n\t- 'docker ps -a | grep kube | grep -v pause'\n\tOnce you have found the failing container, you can inspect its logs with:\n\t- 'docker logs CONTAINERID'", "stdout_lines": ["[init] Using Kubernetes version: v1.13.4", "[preflight] Running pre-flight checks", "[preflight] Pulling images required for setting up a Kubernetes cluster", "[preflight] This might take a minute or two, depending on the speed of your internet connection", "[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'", "[kubelet-start] Writing kubelet environment file with flags to file \"/var/lib/kubelet/kubeadm-flags.env\"", "[kubelet-start] Writing kubelet configuration to file \"/var/lib/kubelet/config.yaml\"", "[kubelet-start] Activating the kubelet service", "[certs] Using certificateDir folder \"/etc/kubernetes/pki\"", "[certs] Generating \"ca\" certificate and key", "[certs] Generating \"apiserver\" certificate and key", "[certs] apiserver serving cert is signed for DNS names [ip-172-31-52-113.us-west-2.compute.internal kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local ec2-34-216-121-64.us-west-2.compute.amazonaws.com ec2-34-216-121-64 master-demok8s.corp.example.com] and IPs [10.96.0.1 172.31.52.113 172.31.52.113 127.0.0.1 10.1.2.3]", "[certs] Generating \"apiserver-kubelet-client\" certificate and key", "[certs] Generating \"front-proxy-ca\" certificate and key", "[certs] Generating \"front-proxy-client\" certificate and key", "[certs] Generating \"etcd/ca\" certificate and key", "[certs] Generating \"etcd/peer\" certificate and key", "[certs] etcd/peer serving cert is signed for DNS names [ip-172-31-52-113.us-west-2.compute.internal localhost] and IPs [172.31.52.113 127.0.0.1 ::1]", "[certs] Generating \"etcd/healthcheck-client\" certificate and key", "[certs] Generating \"etcd/server\" certificate and key", "[certs] etcd/server serving cert is signed for DNS names [ip-172-31-52-113.us-west-2.compute.internal localhost] and IPs [172.31.52.113 127.0.0.1 ::1]", "[certs] Generating \"apiserver-etcd-client\" certificate and key", "[certs] Generating \"sa\" key and public key", "[kubeconfig] Using kubeconfig folder \"/etc/kubernetes\"", "[kubeconfig] Writing \"admin.conf\" kubeconfig file", "[kubeconfig] Writing \"kubelet.conf\" kubeconfig file", "[kubeconfig] Writing \"controller-manager.conf\" kubeconfig file", "[kubeconfig] Writing \"scheduler.conf\" kubeconfig file", "[control-plane] Using manifest folder \"/etc/kubernetes/manifests\"", "[control-plane] Creating static Pod manifest for \"kube-apiserver\"", "[control-plane] Creating static Pod manifest for \"kube-controller-manager\"", "[control-plane] Creating static Pod manifest for \"kube-scheduler\"", "[etcd] Creating static Pod manifest for local etcd in \"/etc/kubernetes/manifests\"", "[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory \"/etc/kubernetes/manifests\". This can take up to 4m0s", "[kubelet-check] Initial timeout of 40s passed.", "", "Unfortunately, an error has occurred:", "\ttimed out waiting for the condition", "", "This error is likely caused by:", "\t- The kubelet is not running", "\t- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)", "", "If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:", "\t- 'systemctl status kubelet'", "\t- 'journalctl -xeu kubelet'", "", "Additionally, a control plane component may have crashed or exited when started by the container runtime.", "To troubleshoot, list all containers using your preferred container runtimes CLI, e.g. docker.", "Here is one example how you may list all Kubernetes containers running in docker:", "\t- 'docker ps -a | grep kube | grep -v pause'", "\tOnce you have found the failing container, you can inspect its logs with:", "\t- 'docker logs CONTAINERID'"]}

Command: ansible-playbook -i hosts.example site.yml --private-key=Key.pem Kubernetes: v1.13.4 CentOS: 7.6 Repo Branch: v1.13

When I ssh into master, and try to get the logs of kube-api-server pod, I get the following error error: unable to load server certificate: open /etc/kubernetes/pki/apiserver.crt: permission denied

I have even tried to run the api manually after giving all the permission to the pki directory, just for test but the error is still same.

Please help me fix the issue

ReSearchITEng commented 5 years ago

Maybe there are some permission issues indeed. Can you run the setup with root?

What does systemctl status kubelet and journalctl -xeu kubelet say? Also cat 10-kubeadm.conf

Can you check if the the cgroup driver used by kubelet and docker is the same?

venomwaqar commented 5 years ago

I couldn't exactly pinpoint the issue, but got it fix by following one of the two options below:

  1. Need to disable the selinux myself on CentOS 7+, I thought playbook was handling it. OR
  2. Since the playbook was installing docker v1.13, when I configured the playbook to install the lastest Docker CE, issue was fixed right on.

Thanks for getting back 👍

ReSearchITEng commented 5 years ago
  1. there is a flag, playbook does handle when enabled. Best experience is when selinux is permissive (not disabled). Disabled might cause issues.
  2. I guess the issue was a cgroups miss-match; There should be no limitation on which docker version is used.