Open cprivitere opened 1 year ago
@cprivitere Hello. Thank you for creating the issue!
Could you please run the upgrade command again with -v9
and share the logs? I would like to check if the scale up operation failed, if the newly added hardware is not showing up from get hardware
then the scale up operation itself must have failed saying not enough hardware to scale.
Sure thing, here's an output with -v9.
eksctl anywhere upgrade cluster -f my-eksa-cluster.yaml --hardware-csv hardware2.csv -v 9
root@eksa-2wgkhd-admin:~# eksctl anywhere upgrade cluster -f my-eksa-cluster.yaml --hardware-csv hardware2.csv -v 9
2023-03-23T19:19:20.512Z V4 Logger init completed {"vlevel": 9}
2023-03-23T19:19:20.514Z V0 Warning: The recommended number of control plane nodes is 3 or 5
2023-03-23T19:19:20.514Z V6 Executing command {"cmd": "/usr/bin/docker version --format {{.Client.Version}}"}
2023-03-23T19:19:20.608Z V6 Executing command {"cmd": "/usr/bin/docker info --format '{{json .MemTotal}}'"}
2023-03-23T19:19:20.754Z V0 Warning: The recommended number of control plane nodes is 3 or 5
2023-03-23T19:19:20.818Z V4 Reading bundles manifest {"url": "https://anywhere-assets.eks.amazonaws.com/releases/bundles/29/manifest.yaml"}
2023-03-23T19:19:20.900Z V0 Warning: The recommended number of control plane nodes is 3 or 5
2023-03-23T19:19:20.900Z V5 Retrier: {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2023-03-23T19:19:20.900Z V2 Pulling docker image {"image": "public.ecr.aws/eks-anywhere/cli-tools:v0.14.3-eks-a-29"}
2023-03-23T19:19:20.900Z V6 Executing command {"cmd": "/usr/bin/docker pull public.ecr.aws/eks-anywhere/cli-tools:v0.14.3-eks-a-29"}
2023-03-23T19:19:21.319Z V5 Retry execution successful {"retries": 1, "duration": "418.98818ms"}
2023-03-23T19:19:21.319Z V3 Initializing long running container {"name": "eksa_1679599160900797177", "image": "public.ecr.aws/eks-anywhere/cli-tools:v0.14.3-eks-a-29"}
2023-03-23T19:19:21.319Z V6 Executing command {"cmd": "/usr/bin/docker run -d --name eksa_1679599160900797177 --network host -w /root -v /var/run/docker.sock:/var/run/docker.sock -v /root:/root --entrypoint sleep public.ecr.aws/eks-anywhere/cli-tools:v0.14.3-eks-a-29 infinity"}
2023-03-23T19:19:21.565Z V4 Inferring local Tinkerbell Bootstrap IP from environment
2023-03-23T19:19:21.566Z V4 Tinkerbell IP {"tinkerbell-ip": "86.109.11.217"}
2023-03-23T19:19:21.566Z V4 Task start {"task_name": "setup-and-validate"}
2023-03-23T19:19:21.566Z V0 Performing setup and validations
2023-03-23T19:19:21.566Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl get clusters.anywhere.eks.amazonaws.com -A -o jsonpath={.items[0]} --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig --field-selector=metadata.name=my-eksa-cluster"}
2023-03-23T19:19:21.836Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl get bundles.anywhere.eks.amazonaws.com bundles-29 -o json --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig --namespace eksa-system"}
2023-03-23T19:19:22.167Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl get --ignore-not-found -o json --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig releases.distro.eks.amazonaws.com --namespace eksa-system kubernetes-1-25-eks-7"}
2023-03-23T19:19:22.443Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl get hardware.tinkerbell.org -l !v1alpha1.tinkerbell.org/ownerName --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig -o json --namespace eksa-system"}
2023-03-23T19:19:22.699Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl get hardware.tinkerbell.org -l v1alpha1.tinkerbell.org/ownerName --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig -o json --namespace eksa-system"}
2023-03-23T19:19:22.972Z V0 β
Tinkerbell provider validation
2023-03-23T19:19:22.972Z V6 Getting KubeadmControlPlane CRDs {"cluster": "my-eksa-cluster"}
2023-03-23T19:19:22.972Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl get kubeadmcontrolplanes.controlplane.cluster.x-k8s.io my-eksa-cluster -o json --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig --namespace eksa-system"}
2023-03-23T19:19:23.245Z V6 waiting for nodes {"cluster": "my-eksa-cluster"}
2023-03-23T19:19:23.245Z V6 counting ready machine deployment replicas {"cluster": "my-eksa-cluster"}
2023-03-23T19:19:23.245Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl get machinedeployments.cluster.x-k8s.io -o json --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig --namespace eksa-system --selector=cluster.x-k8s.io/cluster-name=my-eksa-cluster"}
2023-03-23T19:19:23.530Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl get nodes -o go-template --template {{range .items}}{{.metadata.name}}\n{{end}} --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig"}
2023-03-23T19:19:23.801Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl get node 147.75.88.51 -o go-template --template {{range .status.conditions}}{{if eq .type \"Ready\"}}{{.reason}}{{end}}{{end}} --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig"}
2023-03-23T19:19:24.045Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl get node 147.75.88.52 -o go-template --template {{range .status.conditions}}{{if eq .type \"Ready\"}}{{.reason}}{{end}}{{end}} --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig"}
2023-03-23T19:19:24.288Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl get crd clusters.cluster.x-k8s.io --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig"}
2023-03-23T19:19:24.557Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl get clusters.cluster.x-k8s.io -o json --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig --namespace eksa-system"}
2023-03-23T19:19:24.831Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl version -o json --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig"}
2023-03-23T19:19:25.096Z V3 calculating version differences {"inputVersion": "1.25", "clusterVersion": "1.25.6-eks-232056e"}
2023-03-23T19:19:25.096Z V3 calculated version differences {"majorVersionDifference": 0, "minorVersionDifference": 0}
2023-03-23T19:19:25.096Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl get clusters.anywhere.eks.amazonaws.com -A -o jsonpath={.items[0]} --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig --field-selector=metadata.name=my-eksa-cluster"}
2023-03-23T19:19:25.376Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl get clusters.anywhere.eks.amazonaws.com -A -o jsonpath={.items[0]} --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig --field-selector=metadata.name=my-eksa-cluster"}
2023-03-23T19:19:25.644Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl get tinkerbelldatacenterconfigs.anywhere.eks.amazonaws.com my-eksa-cluster -o json --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig --namespace default"}
2023-03-23T19:19:25.925Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl get tinkerbellmachineconfigs.anywhere.eks.amazonaws.com my-eksa-cluster-cp -o json --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig --namespace "}
2023-03-23T19:19:26.217Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl get tinkerbellmachineconfigs.anywhere.eks.amazonaws.com my-eksa-cluster -o json --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig --namespace "}
2023-03-23T19:19:26.487Z V0 β
Validate certificate for registry mirror
2023-03-23T19:19:26.487Z V0 β
Control plane ready
2023-03-23T19:19:26.487Z V0 β
Worker nodes ready
2023-03-23T19:19:26.487Z V0 β
Nodes ready
2023-03-23T19:19:26.487Z V0 β
Cluster CRDs ready
2023-03-23T19:19:26.487Z V0 β
Cluster object present on workload cluster
2023-03-23T19:19:26.487Z V0 β
Upgrade cluster kubernetes version increment
2023-03-23T19:19:26.487Z V0 β
Validate authentication for git provider
2023-03-23T19:19:26.487Z V0 β
Validate immutable fields
2023-03-23T19:19:26.487Z V0 β
Upgrade preflight validations pass
2023-03-23T19:19:26.487Z V4 Task finished {"task_name": "setup-and-validate", "duration": "4.921349981s"}
2023-03-23T19:19:26.487Z V4 ----------------------------------
2023-03-23T19:19:26.487Z V4 Task start {"task_name": "update-secrets"}
2023-03-23T19:19:26.487Z V4 Task finished {"task_name": "update-secrets", "duration": "1.357Β΅s"}
2023-03-23T19:19:26.487Z V4 ----------------------------------
2023-03-23T19:19:26.487Z V4 Task start {"task_name": "ensure-etcd-capi-components-exist"}
2023-03-23T19:19:26.487Z V0 Ensuring etcd CAPI providers exist on management cluster before upgrade
2023-03-23T19:19:26.487Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl get namespace --field-selector=metadata.name=etcdadm-bootstrap-provider-system --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig"}
2023-03-23T19:19:26.756Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl get provider --namespace etcdadm-bootstrap-provider-system --field-selector=metadata.name=bootstrap-etcdadm-bootstrap --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig"}
2023-03-23T19:19:27.005Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl get namespace --field-selector=metadata.name=etcdadm-controller-system --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig"}
2023-03-23T19:19:27.235Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl get provider --namespace etcdadm-controller-system --field-selector=metadata.name=bootstrap-etcdadm-controller --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig"}
2023-03-23T19:19:27.485Z V4 Task finished {"task_name": "ensure-etcd-capi-components-exist", "duration": "997.360905ms"}
2023-03-23T19:19:27.485Z V4 ----------------------------------
2023-03-23T19:19:27.485Z V4 Task start {"task_name": "pause-controllers-reconcile"}
2023-03-23T19:19:27.485Z V0 Pausing EKS-A cluster controller reconcile
2023-03-23T19:19:27.485Z V5 Retrier: {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2023-03-23T19:19:27.485Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl get --ignore-not-found -o json --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig clusters.anywhere.eks.amazonaws.com --namespace default"}
2023-03-23T19:19:27.748Z V5 Retry execution successful {"retries": 1, "duration": "263.165421ms"}
2023-03-23T19:19:27.748Z V5 Retrier: {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2023-03-23T19:19:27.748Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl annotate tinkerbelldatacenterconfigs.anywhere.eks.amazonaws.com my-eksa-cluster anywhere.eks.amazonaws.com/paused=true --overwrite --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig --namespace default"}
2023-03-23T19:19:28.030Z V5 Retry execution successful {"retries": 1, "duration": "282.02966ms"}
2023-03-23T19:19:28.030Z V5 Retrier: {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2023-03-23T19:19:28.030Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl annotate tinkerbellmachineconfigs.anywhere.eks.amazonaws.com my-eksa-cluster-cp anywhere.eks.amazonaws.com/paused=true --overwrite --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig --namespace default"}
2023-03-23T19:19:28.325Z V5 Retry execution successful {"retries": 1, "duration": "294.720919ms"}
2023-03-23T19:19:28.325Z V5 Retrier: {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2023-03-23T19:19:28.325Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl annotate tinkerbellmachineconfigs.anywhere.eks.amazonaws.com my-eksa-cluster anywhere.eks.amazonaws.com/paused=true --overwrite --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig --namespace default"}
2023-03-23T19:19:28.611Z V5 Retry execution successful {"retries": 1, "duration": "286.541025ms"}
2023-03-23T19:19:28.611Z V5 Retrier: {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2023-03-23T19:19:28.612Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl annotate clusters.anywhere.eks.amazonaws.com my-eksa-cluster anywhere.eks.amazonaws.com/paused=true --overwrite --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig --namespace default"}
2023-03-23T19:19:28.911Z V5 Retry execution successful {"retries": 1, "duration": "299.526137ms"}
2023-03-23T19:19:28.911Z V5 Retrier: {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2023-03-23T19:19:28.911Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl annotate clusters.anywhere.eks.amazonaws.com my-eksa-cluster anywhere.eks.amazonaws.com/managed-by-cli=true --overwrite --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig --namespace default"}
2023-03-23T19:19:29.202Z V5 Retry execution successful {"retries": 1, "duration": "290.474344ms"}
2023-03-23T19:19:29.202Z V0 Pausing GitOps cluster resources reconcile
2023-03-23T19:19:29.202Z V4 GitOps field not specified, pause cluster resources reconcile skipped
2023-03-23T19:19:29.202Z V4 Task finished {"task_name": "pause-controllers-reconcile", "duration": "1.717128141s"}
2023-03-23T19:19:29.202Z V4 ----------------------------------
2023-03-23T19:19:29.202Z V4 Task start {"task_name": "upgrade-core-components"}
2023-03-23T19:19:29.202Z V0 Upgrading core components
2023-03-23T19:19:29.202Z V1 Nothing to upgrade for Cilium, skipping
2023-03-23T19:19:29.202Z V1 Checking for CAPI upgrades
2023-03-23T19:19:29.202Z V1 Nothing to upgrade for CAPI
2023-03-23T19:19:29.202Z V1 Checking for Flux upgrades
2023-03-23T19:19:29.202Z V1 Skipping Flux upgrades, GitOps not enabled
2023-03-23T19:19:29.202Z V1 Nothing to upgrade for Flux
2023-03-23T19:19:29.202Z V1 Checking for EKS-A components upgrade
2023-03-23T19:19:29.202Z V1 Nothing to upgrade for controller and CRDs
2023-03-23T19:19:29.202Z V1 Checking for EKS-D components upgrade
2023-03-23T19:19:29.202Z V1 Nothing to upgrade for EKS-D components
2023-03-23T19:19:29.202Z V4 Task finished {"task_name": "upgrade-core-components", "duration": "225.131Β΅s"}
2023-03-23T19:19:29.202Z V4 ----------------------------------
2023-03-23T19:19:29.202Z V4 Task start {"task_name": "upgrade-needed"}
2023-03-23T19:19:29.202Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl get clusters.anywhere.eks.amazonaws.com -A -o jsonpath={.items[0]} --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig --field-selector=metadata.name=my-eksa-cluster"}
2023-03-23T19:19:29.488Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl get bundles.anywhere.eks.amazonaws.com bundles-29 -o json --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig --namespace eksa-system"}
2023-03-23T19:19:29.812Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl get --ignore-not-found -o json --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig releases.distro.eks.amazonaws.com --namespace eksa-system kubernetes-1-25-eks-7"}
2023-03-23T19:19:30.110Z V3 Clusters are the same
2023-03-23T19:19:30.110Z V0 No upgrades needed from cluster spec
2023-03-23T19:19:30.110Z V4 Task finished {"task_name": "upgrade-needed", "duration": "908.019697ms"}
2023-03-23T19:19:30.110Z V4 ----------------------------------
2023-03-23T19:19:30.110Z V4 Task start {"task_name": "resume-eksa-and-gitops-kustomization"}
2023-03-23T19:19:30.110Z V0 Resuming EKS-A controller reconcile
2023-03-23T19:19:30.110Z V5 Retrier: {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2023-03-23T19:19:30.110Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl get --ignore-not-found -o json --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig clusters.anywhere.eks.amazonaws.com"}
2023-03-23T19:19:30.386Z V5 Retry execution successful {"retries": 1, "duration": "275.815573ms"}
2023-03-23T19:19:30.386Z V5 Retrier: {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2023-03-23T19:19:30.386Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl annotate tinkerbelldatacenterconfigs.anywhere.eks.amazonaws.com my-eksa-cluster anywhere.eks.amazonaws.com/paused- --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig --namespace default"}
2023-03-23T19:19:30.657Z V5 Retry execution successful {"retries": 1, "duration": "271.122665ms"}
2023-03-23T19:19:30.657Z V5 Retrier: {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2023-03-23T19:19:30.657Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl annotate tinkerbellmachineconfigs.anywhere.eks.amazonaws.com my-eksa-cluster-cp anywhere.eks.amazonaws.com/paused- --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig --namespace default"}
2023-03-23T19:19:30.945Z V5 Retry execution successful {"retries": 1, "duration": "287.85848ms"}
2023-03-23T19:19:30.945Z V5 Retrier: {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2023-03-23T19:19:30.945Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl annotate tinkerbellmachineconfigs.anywhere.eks.amazonaws.com my-eksa-cluster anywhere.eks.amazonaws.com/paused- --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig --namespace default"}
2023-03-23T19:19:31.222Z V5 Retry execution successful {"retries": 1, "duration": "277.061899ms"}
2023-03-23T19:19:31.222Z V5 Retrier: {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2023-03-23T19:19:31.223Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl annotate clusters.anywhere.eks.amazonaws.com my-eksa-cluster anywhere.eks.amazonaws.com/paused- --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig --namespace default"}
2023-03-23T19:19:31.522Z V5 Retry execution successful {"retries": 1, "duration": "299.636052ms"}
2023-03-23T19:19:31.522Z V5 Retrier: {"timeout": "2562047h47m16.854775807s", "backoffFactor": null}
2023-03-23T19:19:31.522Z V6 Executing command {"cmd": "/usr/bin/docker exec -i eksa_1679599160900797177 kubectl annotate clusters.anywhere.eks.amazonaws.com my-eksa-cluster anywhere.eks.amazonaws.com/managed-by-cli- --kubeconfig my-eksa-cluster/my-eksa-cluster-eks-a-cluster.kubeconfig --namespace default"}
2023-03-23T19:19:31.804Z V5 Retry execution successful {"retries": 1, "duration": "281.555572ms"}
2023-03-23T19:19:31.804Z V0 Updating Git Repo with new EKS-A cluster spec
2023-03-23T19:19:31.804Z V0 GitOps field not specified, update git repo skipped
2023-03-23T19:19:31.804Z V0 Forcing reconcile Git repo with latest commit
2023-03-23T19:19:31.804Z V0 GitOps not configured, force reconcile flux git repo skipped
2023-03-23T19:19:31.804Z V0 Resuming GitOps cluster resources kustomization
2023-03-23T19:19:31.804Z V4 GitOps field not specified, resume cluster resources reconcile skipped
2023-03-23T19:19:31.804Z V4 Task finished {"task_name": "resume-eksa-and-gitops-kustomization", "duration": "1.693660479s"}
2023-03-23T19:19:31.804Z V4 ----------------------------------
2023-03-23T19:19:31.804Z V4 Tasks completed {"duration": "10.238254304s"}
2023-03-23T19:19:31.804Z V3 Cleaning up long running container {"name": "eksa_1679599160900797177"}
2023-03-23T19:19:31.804Z V6 Executing command {"cmd": "/usr/bin/docker rm -f -v eksa_1679599160900797177"}
root@eksa-2wgkhd-admin:~# kubectl get hardware -A
NAMESPACE NAME STATE
eksa-system eksa-2wgkhd-node-cp-001
eksa-system eksa-2wgkhd-node-worker-001
And here's the hardware2.csv file
root@eksa-2wgkhd-admin:~# cat hardware2.csv
hostname,vendor,mac,ip_address,gateway,netmask,nameservers,disk,labels
eksa-2wgkhd-node-worker-002,Equinix,10:70:fd:86:ec:3e,147.75.88.60,147.75.88.49,255.255.255.240,8.8.8.8|8.8.4.4,/dev/sda,type=worker
@cprivitere hey just catching up with the issue.
Looking at the logs, it seems like there were no changes detected in the cluster config and hence there was nothing to upgrade?
2023-03-23T19:19:30.110Z V3 Clusters are the same
2023-03-23T19:19:30.110Z V0 No upgrades needed from cluster spec
2023-03-23T19:19:30.110Z V4 Task finished {"task_name": "upgrade-needed", "duration": "908.019697ms"}
We dont apply the newly provided hardware to the cluster unless there is something to upgrade. The apply command for new hardware is run on the local KinD cluster that gets created for the upgrade process for management clusters.
Could we confirm that there was a change in the cluster.yaml
file? Specifically for scaling a worker we need to increase the
workerNodeGroupsConfiguration:
- count: 1
Yep, looks like that's the piece I'm missing. I either missed it in the documentation or it was added after I last tried, but I'll give this a go.
I was able to give this a try and it looks like it doesn't work. I created a new hardware.csv with the new machine's name and mac address and other relevant info, passed it into the eksctl upgrade command, and no new hardware objects were created. And what's worse is the program just leaves me hanging waiting forever for the upgrade. I had to do a separate trouble shooting session just to finally confirm from boots that there was no hardware entry for the mac address, as well as confirming there was indeed no new hardware object.
Here's the artifacts I can give you. First, the output from eksctl:
root@eksa-ecqzeq-admin:~# eksctl anywhere upgrade cluster -f my-eksa-cluster.yaml --hardware-csv hardware.csv --kubeconfig $KUBECONFIG
Warning: The recommended number of control plane nodes is 3 or 5
Warning: The recommended number of control plane nodes is 3 or 5
Warning: The recommended number of control plane nodes is 3 or 5
Performing setup and validations
β
Tinkerbell provider validation
β
Validate certificate for registry mirror
β
Control plane ready
β
Worker nodes ready
β
Nodes ready
β
Cluster CRDs ready
β
Cluster object present on workload cluster
β
Upgrade cluster kubernetes version increment
β
Validate authentication for git provider
β
Validate immutable fields
β
Upgrade preflight validations pass
Ensuring etcd CAPI providers exist on management cluster before upgrade
Pausing EKS-A cluster controller reconcile
Pausing GitOps cluster resources reconcile
Upgrading core components
Upgrading workload cluster
Then, the hardware.csv I used:
hostname,vendor,mac,ip_address,gateway,netmask,nameservers,disk,labels
eksa-ecqzeq-node-worker-002,Equinix,e8:eb:d3:10:0b:ae,147.75.88.230,147.75.88.225,255.255.255.240,8.8.8.8|8.8.4.4,/dev/sda,type=worker
Then the output from kubectl get hardware:
root@eksa-ecqzeq-admin:~# kubectl get hardware -n eksa-system --show-labels
NAME STATE LABELS
eksa-ecqzeq-node-cp-001 type=cp,v1alpha1.tinkerbell.org/ownerName=my-eksa-cluster-control-plane-template-1682360077903-6jz8j,v1alpha1.tinkerbell.org/ownerNamespace=eksa-system
eksa-ecqzeq-node-worker-001 type=worker,v1alpha1.tinkerbell.org/ownerName=my-eksa-cluster-md-0-1682360077905-hg82j,v1alpha1.tinkerbell.org/ownerNamespace=eksa-system
Then finally the error message in boots:
{"level":"error","ts":1682373820.0116758,"caller":"boots/dhcp.go:101","msg":"retrieved job is empty","service":"github.com/tinkerbell/boots","pkg":"main","type":"DHCPDISCOVER","mac":"e8:eb:d3:10:0b:ae","error":"discover from dhcp message: no hardware found","errorVerbose":"no hardware found\ngithub.com/tinkerbell/boots/client/kubernetes.(*Finder).ByMAC\n\tgithub.com/tinkerbell/boots/client/kubernetes/hardware_finder.go:100\ngithub.com/tinkerbell/boots/job.(*Creator).CreateFromDHCP\n\tgithub.com/tinkerbell/boots/job/job.go:112\nmain.dhcpHandler.serve\n\tgithub.com/tinkerbell/boots/cmd/boots/dhcp.go:99\nmain.dhcpHandler.ServeDHCP.func1\n\tgithub.com/tinkerbell/boots/cmd/boots/dhcp.go:60\ngithub.com/gammazero/workerpool.(*WorkerPool).dispatch.func1\n\tgithub.com/gammazero/workerpool@v0.0.0-20200311205957-7b00833861c6/workerpool.go:169\nruntime.goexit\n\truntime/asm_amd64.s:1571\ndiscover from dhcp message"}
@pokearu
What happened: Trying to scale up cluster following directions from: https://anywhere.eks.amazonaws.com/docs/tasks/cluster/cluster-scale/baremetal-scale/
After I add new hardware to a cluster with
eksctl anywhere upgrade cluster -f cluster.yaml --hardware-csv <hardware.csv>
I cannot see it when I list it with:kubectl get hardware -n eksa-system --show-labels
Attempting to boot it gets the following error in boots:{"level":"error","ts":1678917443.4466734,"caller":"boots/dhcp.go:101","msg":"retrieved job is empty","service":"github.com/tinkerbell/boots","pkg":"main","type":"DHCPDISCOVER","mac":"e8:eb:d3:10:0b:5a","error":"discover from dhcp message: no hardware found","errorVerbose":"no hardware found\ngithub.com/tinkerbell/boots/client/kubernetes.(*Finder).ByMAC\n\tgithub.com/tinkerbell/boots/client/kubernetes/hardware_finder.go:100\ngithub.com/tinkerbell/boots/job.(*Creator).CreateFromDHCP\n\tgithub.com/tinkerbell/boots/job/job.go:112\nmain.dhcpHandler.serve\n\tgithub.com/tinkerbell/boots/cmd/boots/dhcp.go:99\nmain.dhcpHandler.ServeDHCP.func1\n\tgithub.com/tinkerbell/boots/cmd/boots/dhcp.go:60\ngithub.com/gammazero/workerpool.(*WorkerPool).dispatch.func1\n\tgithub.com/gammazero/workerpool@v0.0.0-20200311205957-7b00833861c6/workerpool.go:169\nruntime.goexit\n\truntime/asm_amd64.s:1571\ndiscover from dhcp message"}
What you expected to happen: The hardware defined in the new hardware.csv file should be added as new hardware objects in the cluster.
How to reproduce it (as minimally and precisely as possible): Create a new node on Equinix Metal. Figure out its mac address and add it to the hardware.csv. Run the documented command:
eksctl anywhere upgrade cluster -f cluster.yaml --hardware-csv <hardware.csv>
See that no new hardware was added.Anything else we need to know?: Doing the hardware generation manually with
eksctl anywhere generate hardware
and then applying the generated yaml files works fine.Environment: