cookeem / kubeadm-ha

通过kubeadm安装kubernetes高可用集群,使用docker/containerd容器运行时,适用v1.24.x以上版本
MIT License
679 stars 275 forks source link

Testing k8s ha configuration by shutting down the first k8s master node #6

Closed kcao3 closed 7 years ago

kcao3 commented 7 years ago

@cookeen, I followed your provided instruction and was able to deploy a HA Kubernetes cluster (with 3 k8s master nodes and 2 k8s nodes) using Kubernetes version 1.8.1 Everything seems working just like you described in instruction.

Next, I focused on testing the high availablity configuration. To do so, I attempted to shutdown the first k8s master. Once the first k8s master is brought down, the keepalived service on this node stopped and the virtual IP address transferred to the second k8s master. However, things start falling apart :(

Specifically, on the second (or third) master, when running the command: 'kubectl get nodes', the output shows something like the following:

NAME STATUS ROLES ... k8s-master1 NotReady master ... k8s-master2 Ready ... k8s-master3 Ready ... k8s-node1 Ready ... k8s-node2 Ready ...

Also, on k8s-master2 or k8s-master3, when I ran 'kubectl logs' to check controller-manager and scheduler, it appeared they did NOT reelect a new leader. As a result, all of the kubernetes services that were exposed before were no longer accessible.

Do you have any idea why the reelection process did NOT occur for the controller-manager and scheduler on the remaining k8s master nodes?

cookeem commented 7 years ago

I didn't test version 1.8.x yet. On master1, master2, master3 edit kube-apiserver.yaml, kubelet.conf, admin.conf, controller-manager.conf, scheduler.conf files' server config to current host ip address, check it works or not.

If it works, I think the problem is keepalived, check the keepalived's log.

if it does not work, check the kubelet's log.

Then show me the log, pls.

kcao3 commented 7 years ago

It turns out the reelection process for the controller-manager and scheduler running on the k8s master nodes worked just fine. Keepalived was working just fine as well.

The root cause of the problem is the setting in the configmap 'cluster-info' in the kube-public namespace. So, in addition to the configmap 'kube-proxy' in the kube-system namespace, I had to edit the 'cluster-info' configmap, replaced the host ip address:6443 with the virtual IP address:8443. This is extremely important for any new worker node to bootstrap with the correct configuration setting when joining the cluster, using kubeadm join. For my 2 existing k8s nodes, I just manually updated the /etc/kubernetes/kubelet.conf, restarted the docker and kubelet service on these nodes and everything works as expected :)

Thank you so much for your prompt response.

cookeem commented 7 years ago

In my instruction there's "kube-proxy configuration", did you say this config?

$ kubectl edit -n kube-system configmap/kube-proxy
        server: https://192.168.60.80:8443
discordianfish commented 6 years ago

Keep in mind that on next kubeadm init it will override the kube-proxy configmap