Open chrisdoherty4 opened 9 months ago
I've noticed this on numerous occasions. If I were to speculate, the nodes may be are instructed to netboot and then turn on in such quick succession that the netboot doesn't actually get configured.
CAPT deprovisions nodes by turning them off. That means its possible they could boot and rejoin the cluster.
If the netboot isn't correctly selected and a recycled node is booted it can rejoin an existing cluster and create provisioning issues.
The problem would be fixed by adjusting CAPI/CAPT to ensure recycled nodes cannot rejoin clusters.
Summary
When provisioning a bare metal cluster nodes are meant to netboot. Nodes are turned off by EKS-A and configured to netboot by CAPT. Occasionally nodes boot to the disk instead of netbooting requiring intervention from operators.
This was observed on Dell R240 machines only.