cloudfoundry-incubator / kubo-release

Kubernetes BOSH release
https://www.cloudfoundry.org/container-runtime/
Apache License 2.0
161 stars 76 forks source link

While scale up master node the etcd job is getting failed #386

Open anuya9 opened 4 years ago

anuya9 commented 4 years ago

Hello Team,

We have deployed kuberenetes cluster with 3 master and 3 worker nodes. We were trying to scale up the master node from 3 to 4 and deploy it through bosh-manifest. The deployment completed with error mentioned below:

Error: Action Failed get_task: Task 1ac12df5-dc01-47d3-46b5-42d4d749cad0 result: 1 of 4 post-start scripts failed. Failed Jobs: etcd. Successful Jobs: bosh-dns, kube-apiserver, kubernetes-roles.

All the VMs were in running state. We checked the health of cluster by running ./etcdctl cluster-health,so it was observed that the newly added master node [master-3] is unhealthy and health check failed with error mentioned below: failed to check the health of member 56f3d8fb2ca0cde1 on https://master-3.xxxxx.xxxxx.internal:2379: Get https://master-3.xxxxx.xxxxx.internal:2379/health: net/http: TLS handshake timeout member 56f3d8fb2ca0cde1 is unreachable: [https://master-3.xxxxx.xxxxx.internal:2379] are all unreachable cluster is degraded.

We are deploying cluster through cfcr deployment over bosh. Bosh Version: 268.8 Kubernetes Version: 1.15.5 Ubuntu Version: 16.04.6 AWS ec2 instance

Below is the link that we are using to deploy our Kubernetes cluster: https://github.com/cloudfoundry-incubator/kubo-release https://github.com/cloudfoundry-incubator/cfcr-etcd-release

Anyone has idea about this issue and how can it be fixed?

svrc commented 4 years ago

This can be mitigated by using the etcdctl command to remove the member and re-adding it. It has been resolved in CFCR etcd release 1.12.4 by using an older version of the etcdctl CLI.