What happened:
Scenario: I have a bare metal cluster. 1 control plane node and 3 worker nodes. One worker node fails. It becomes "NotReady" and triggers CAPI to attempt to scale up the cluster to handle the failed node. I do not have any available hardware.
Context: As a customer I accept the failed node and decide i want to just scale the cluster down. To do this, I update my cluster.yaml to scale down the worker nodes from 3 -> 2. Then I run eksctl anywhere upgrade cluster. This command fails with the following:
Performing setup and validations
✅ Tinkerbell provider validation
✅ Validate OS is compatible with registry mirror configuration
✅ Validate certificate for registry mirror
✅ Control plane ready
❌ Validation failed {"validation": "worker nodes ready", "error": "machine deployment is in ScalingUp phase", "remediation": "ensure machine deployments for cluster bottlerocket are Ready"}
❌ Validation failed {"validation": "nodes ready", "error": "node bottlerocket-md-0-dc6c59864xk66p4-fj2v2 is not ready, currently in NodeStatusUnknown state", "remediation": "check the Status of the control plane and worker nodes in cluster bottlerocket and verify they are Ready"}
✅ Cluster CRDs ready
✅ Cluster object present on workload cluster
✅ Upgrade cluster kubernetes version increment
✅ Upgrade cluster worker node group kubernetes version increment
✅ Validate authentication for git provider
✅ Validate immutable fields
✅ Validate cluster's eksaVersion matches EKS-A Version
✅ Validate pod disruption budgets
✅ Validate eksaVersion skew is one minor version
Error: failed to upgrade cluster: validations failed
Additional info
user@machine$ kubectl get machinedeployment -A
NAMESPACE NAME CLUSTER REPLICAS READY UPDATED UNAVAILABLE PHASE AGE VERSION
eksa-system bottlerocket-md-0 bottlerocket 3 2 3 1 ScalingUp 5d v1.27.4-eks-1-27-9
user@machine$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
bottlerocket-md-0-dc6c59864xk66p4-fj2v2 NotReady <none> 5d1h v1.27.1-eks-75a5dcc
bottlerocket-md-0-dc6c59864xk66p4-lzprn Ready <none> 17h v1.27.1-eks-75a5dcc
bottlerocket-md-0-dc6c59864xk66p4-w2vjp Ready <none> 5d v1.27.1-eks-75a5dcc
bottlerocket-rxxx8 Ready control-plane 5d14h v1.27.1-eks-75a5dcc
What you expected to happen:
In this scenario, if i modify the in cluster config, kubectl edit cluster mycluster i am able to scale down the cluster and bring the cluster back to a normal, healthy, running state. I expect the CLI to behave similar.
How to reproduce it (as minimally and precisely as possible):
What happened: Scenario: I have a bare metal cluster. 1 control plane node and 3 worker nodes. One worker node fails. It becomes "NotReady" and triggers CAPI to attempt to scale up the cluster to handle the failed node. I do not have any available hardware.
Context: As a customer I accept the failed node and decide i want to just scale the cluster down. To do this, I update my
cluster.yaml
to scale down the worker nodes from 3 -> 2. Then I runeksctl anywhere upgrade cluster
. This command fails with the following:Additional info
What you expected to happen: In this scenario, if i modify the in cluster config,
kubectl edit cluster mycluster
i am able to scale down the cluster and bring the cluster back to a normal, healthy, running state. I expect the CLI to behave similar.How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment: