Open czomo opened 1 year ago
Hi @czomo, thanks for using EKS-A.
So tinkerbell (baremetal) with EKS-A uses stacked ETCD and not external. So if one of the 3 CP nodes goes down ETCD would have an even number of nodes and run into issues.
Did you notice when you rebooted the CP node, did it try to join back into the cluster? We run cloud-init and that would join the node back.
Hi @pokearu
Did you notice when you rebooted the CP node
I observed it both when rebooting the CP node and during networking issues
did it try to join back into the cluster?
yes, multiple times. However each time etcd member is removed from cluster after some time
We run cloud-init and that would join the node back.
The problem is that seems not to be working as expected. As I mentioned above rebooted CP gets up with missing annotation and labels(its not marked as CP anymore). Any idea what could happen that cloud-init was modified? Could you point me where cloud-init is located?
What happened: I am using EKS-A on intel nuc x3 control-plane deployed with tinkerbell provider. Cluster works fine time after provisioning. After reboot one of the nodes for maintenance ectd starts to malfunctioning. Imiediatly member is removed from etcd cluster. Node becomes unready.
What you expected to happen: After reboot cluster should return to previous configuration without any distrubstion for workload. Etcd member shouldn't leave cluster.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
ETCD logs
Environment: