aws / eks-anywhere

Run Amazon EKS on your own infrastructure 🚀
https://anywhere.eks.amazonaws.com
Apache License 2.0
1.94k stars 277 forks source link

EKSA hangs while creating a cluster #6837

Open makesunrise opened 9 months ago

makesunrise commented 9 months ago

I have eks-A cluster running on VShphere provider. I am already running a management cluster and two workload clusters. However, I am unable to create any more clusters and cluster creation process hangs.

  1. No kubeconfig for kind cluster gets created. Only see a yaml file under generated folder.

  2. There is no activity in vcenter. I do not see any new VMs trying to come up by cloning the template.

  3. Create cluster commands gets time out after an hour with following output - image

  4. When trying to recreate the cluster, ekctl complains that cluster already exists. However, kubectl show cluster does not list it.

  5. Tried creating from mulitple admin machines and have been running into same issues.

  6. Docker container which gets spin by eks anyhwere does not show any logs image Here is the output from eksa-cli-logs -image Not an issue with docker memory or it being low on resources. Seeing same issue when there are no other containers running.

pokearu commented 9 months ago

Hi @makesunrise thanks for using EKS-A.

  1. No kubeconfig for kind cluster gets created. Only see a yaml file under generated folder.

So if you are creating a new workload cluster from the management you would not see a KinD cluster. The existing management cluster would do.

  1. When trying to recreate the cluster, ekctl complains that cluster already exists. However, kubectl show cluster does not list it.

It might be because there is a folder with the name of the cluster on your system? You can delete the folder and retry or use a different cluster name. Also, the kubectl command you are checking is it kubectl get clusters.anywhere.eks.amazonaws.com or kubectl get clusters.cluster.x-k8s.io` ?

  1. Tried creating from multiple admin machines and have been running into same issues.

Are you using the same management cluster to create a workload for all the admin machine runs??

  1. Docker container which gets spin by eks anyhwere does not show any logs

The container would not have much. Its just a helper container to perform validations etc.

Also could you tell me the EKS-A version and k8s version ?

makesunrise commented 8 months ago

hello,

I am using kubectl get clusters.anywhere.eks.amazonaws.com command.

However, following command returns a json file which has cluster names of those which got failed provisioning kubectl get clusters.cluster.x-k8s.io -o json --kubeconfigk8s-mgmt/-k8s-mgmt-eks-a-cluster.kubeconfig --namespace eksa-system

I see following in the json -

"lastTransitionTime": "2023-10-09T16:31:34Z", "message": "Secret \"playground2-vsphere-credentials\" not found", "reason": "VCenterUnreachable", "severity": "Error", "status": "False", "type": "InfrastructureReady"

However, I do see secret playground2-vsphere-credentials in eksa-system namespace of the management cluster

image

full json for failed clusters has following status -

"apiVersion": "cluster.x-k8s.io/v1beta1", "kind": "Cluster", "metadata": { "annotations": { "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"cluster.x-k8s.io/v1beta1\",\"kind\":\"Cluster\",\"metadata\":{\"annotations\":{},\"labels\":{\"cluster.x-k8s.io/cluster-name\":\"playground2\"},\"name\":\"playground2\",\"namespace\":\"eksa-system\"},\"spec\":{\"clusterNetwork\":{\"pods\":{\"cidrBlocks\":[\"192.168.0.0/16\"]},\"services\":{\"cidrBlocks\":[\"10.96.0.0/12\"]}},\"controlPlaneRef\":{\"apiVersion\":\"controlplane.cluster.x-k8s.io/v1beta1\",\"kind\":\"KubeadmControlPlane\",\"name\":\"playground2\"},\"infrastructureRef\":{\"apiVersion\":\"infrastructure.cluster.x-k8s.io/v1beta1\",\"kind\":\"VSphereCluster\",\"name\":\"playground2\"}}}\n" }, "creationTimestamp": "2023-10-09T16:31:34Z", "finalizers": [ "cluster.cluster.x-k8s.io" ], "generation": 1, "labels": { "cluster.x-k8s.io/cluster-name": "playground2" }, "name": "playground2", "namespace": "eksa-system", "resourceVersion": "3050547", "uid": "ca13e8ce-5e5d-46e4-80bc-5c7f6f22977f" }, "spec": { "clusterNetwork": { "pods": { "cidrBlocks": [ "192.168.0.0/16" ] }, "services": { "cidrBlocks": [ "10.96.0.0/12" ] } }, "controlPlaneEndpoint": { "host": "", "port": 0 }, "controlPlaneRef": { "apiVersion": "controlplane.cluster.x-k8s.io/v1beta1", "kind": "KubeadmControlPlane", "name": "playground2", "namespace": "eksa-system" }, "infrastructureRef": { "apiVersion": "infrastructure.cluster.x-k8s.io/v1beta1", "kind": "VSphereCluster", "name": "playground2", "namespace": "eksa-system" } }, "status": { "conditions": [ { "lastTransitionTime": "2023-10-09T16:31:34Z", "message": "Secret \"playground2-vsphere-credentials\" not found", "reason": "VCenterUnreachable", "severity": "Error", "status": "False", "type": "Ready" }, { "lastTransitionTime": "2023-10-09T16:31:35Z", "message": "Waiting for control plane provider to indicate the control plane has been initialized", "reason": "WaitingForControlPlaneProviderInitialized", "severity": "Info", "status": "False", "type": "ControlPlaneInitialized" }, { "lastTransitionTime": "2023-10-09T16:31:35Z", "message": "Scaling up control plane to 3 replicas (actual 0)", "reason": "ScalingUp", "severity": "Warning", "status": "False", "type": "ControlPlaneReady" }, { "lastTransitionTime": "2023-10-09T16:31:34Z", "message": "Secret \"playground2-vsphere-credentials\" not found", "reason": "VCenterUnreachable", "severity": "Error", "status": "False", "type": "InfrastructureReady" } ], "observedGeneration": 1, "phase": "Provisioning" }

EKS-A - v0.17.1 Kubernetes version - 1.27