Open robbierafay opened 1 year ago
Any ideas to force machines.cluster.x-k8s.io to Phase Running ?
To unblock myself, I was able to find a hack to force the upgrade to complete. I manually edited hardware crd for the problem node and set states to trigger the following condition in code which marked the tinkerbellmachine as Ready and machine.c as Running
hw.Spec.Metadata.State == inUse && hw.Spec.Metadata.Instance.State == provisioned
Cluster upgrade completes after this change.
Change in hardware crds
......
.....
metadata:
facility:
facility_code: onprem
plan_slug: c2.medium.x86
instance:
allow_pxe: true
always_pxe: true
hostname: eksabm1-dp-n-1
id: 08:00:27:73:28:85
ips:
- address: 192.168.10.229
family: 4
gateway: 192.168.10.1
netmask: 255.255.255.0
public: true
operating_system: {}
state: provisioned <<<<<<<<----------------------
state: in_use <<<<<<<<----------------------
userData: |2-
What happened: K8s Upgrade failed on self managed 4 node cluster (1 cp, 3 workers) as one of workers machines.cluster.x-k8s.io stuck in phase Provisioning despite the node completing tinkerbell workflow successfully and joining the cluster
What you expected to happen: K8s upgrade to go through successfully
How to reproduce it (as minimally and precisely as possible):
Provision a 4 node cluster (1 cp, 3 workers) using K8s 1.23. eksctl anywhere version 0.16.1 Add two spare hardwares - 1 cp and 1 for worker
Upgrade to k8s 1.24 using
Nodes go through rolling upgrade. A existing worker node stuck in Provisioning Phase : eksabm1-dp-n-2 even though it joined cluster ok . eksctl anywhere upgrade cluster command never finishes. Cluster crds are stuck on bootstrap cluster and worker cluster is unmanageable until upgrade completes.
From bootstrap Kind cluster :
From workload cluster :
Anything else we need to know?:
cluster-upgrade.yaml.txt hardware.csv.txt hardware-upgrade.csv.txt cluster.yaml.txt
cli.log.txt
Environment: