Closed elankath closed 9 months ago
Teested fix on GCP (node label is same as machine name for this provider). Removed the node
label and initiated machine deletion. Node label is now set again prior to drain and deletion. Node deletion successfully occurs even when node
label is missing.
I1226 10:08:31.120914 86315 machine.go:128] reconcileClusterMachine: Start for "shoot--i034796--g1-w1-z1-788d9-hlgnx" with phase:"Terminating", description:"Set machine status to termination. Now, getting VM Status"
I1226 10:08:33.742750 86315 machine_util.go:1685] Updating "node" label on machine "shoot--i034796--g1-w1-z1-788d9-hlgnx" to "shoot--i034796--g1-w1-z1-788d9-hlgnx"
I1226 10:08:33.926901 86315 machine_util.go:1696] Updated "node" label on machine "shoot--i034796--g1-w1-z1-788d9-hlgnx" to "shoot--i034796--g1-w1-z1-788d9-hlgnx
I1226 10:08:41.699711 86315 machine_util.go:1104] Normal delete/drain has been triggerred for machine "shoot--i034796--g1-w1-z1-788d9-hlgnx"
...
I1226 10:11:05.334091 86315 machine_controller.go:131] VM "gce:///OMITTED/shoot--i034796--g1-w1-z1-788d9-hlgnx" for Machine "shoot--i034796--g1-w1-z1-788d9-hlgnx" was terminated succesfully
I1226 10:11:10.681394 86315 machine_util.go:1357] Deleting node "shoot--i034796--g1-w1-z1-788d9-hlgnx" associated with machine "shoot--i034796--g1-w1-z1-788d9-hlgnx"
I1226 10:11:16.055535 86315 machine.go:648] Machine "shoot--i034796--g1-w1-z1-788d9-hlgnx" with providerID "gce:///OMITTED/shoot--i034796--g1-w1-z1-788d9-hlgnx" and nodeName "shoot--i034796--g1-w1-z1-788d9-hlgnx" deleted successfully
Teested fix on AWS (node label is diff from machine name for this provider). Removed the node label and initiated machine deletion. Node label is now set again prior to drain and deletion. Node deletion successfully occurs even when node label is missing.
I1226 10:37:06.233292 90959 machine_util.go:1685] Updating "node" label on machine "shoot--i034796--aw3-a-z1-9cc57-q6qbl" to "ip-10-180-29-167.eu-west-1.compute.internal"
I1226 10:37:06.405762 90959 machine_util.go:1696] Updated "node" label on machine "shoot--i034796--aw3-a-z1-9cc57-q6qbl" to "ip-10-180-29-167.eu-west-1.compute.internal"
I1226 10:37:55.994989 90959 core.go:285] Machine deletion request has been recieved for "shoot--i034796--aw3-a-z1-9cc57-q6qbl"
I1226 10:37:56.382539 90959 core.go:311] VM "aws:///eu-west-1/i-078071299a1bce4ca" for Machine "shoot--i034796--aw3-a-z1-9cc57-q6qbl" was terminated successfully
I1226 10:38:01.724111 90959 machine_util.go:1357] Deleting node "ip-10-180-29-167.eu-west-1.compute.internal" associated with machine "shoot--i034796--aw3-a-z1-9cc57-q6qbl"
I1226 10:38:01.724126 90959 machine_util.go:1365] Deletion of Node Object "ip-10-180-29-167.eu-west-1.compute.internal" is successful. Initiate machine object finalizer removal
I1226 10:38:07.079213 90959 machine.go:648] Machine "shoot--i034796--aw3-a-z1-9cc57-q6qbl" with providerID "aws:///eu-west-1/i-078071299a1bce4ca" and nodeName "ip-10-180-29-167.eu-west-1.compute.internal" deleted successfully
What this PR does / why we need it:
When a Node is never associated with its Machine. Ie the machine object never has the
machine.Labels[v1alpha1.NodeLabelKey]
set after the machine creation, then during the deletion flow, our Node object is not deleted. (Label updation can be missed if the machine object update transiently fails)Then after some time, the dangling Node object gets the
NotManagedByMCM
annotation.Which issue(s) this PR fixes: Fixes #875
Special notes for your reviewer:
Release note: