Closed elankath closed 10 months ago
@elankath You have mentioned internal references in the public. Please check.
@elankath You have mentioned internal references in the public. Please check.
Teested fix on GCP (node label is same as machine name for this provider). Removed the node
label and initiated machine deletion. Node label is now set again prior to drain and deletion. Node deletion successfully occurs even when node
label is missing.
I1226 10:08:31.120914 86315 machine.go:128] reconcileClusterMachine: Start for "shoot--i034796--g1-w1-z1-788d9-hlgnx" with phase:"Terminating", description:"Set machine status to termination. Now, getting VM Status"
I1226 10:08:33.742750 86315 machine_util.go:1685] Updating "node" label on machine "shoot--i034796--g1-w1-z1-788d9-hlgnx" to "shoot--i034796--g1-w1-z1-788d9-hlgnx"
I1226 10:08:33.926901 86315 machine_util.go:1696] Updated "node" label on machine "shoot--i034796--g1-w1-z1-788d9-hlgnx" to "shoot--i034796--g1-w1-z1-788d9-hlgnx
I1226 10:08:41.699711 86315 machine_util.go:1104] Normal delete/drain has been triggerred for machine "shoot--i034796--g1-w1-z1-788d9-hlgnx"
...
I1226 10:11:05.334091 86315 machine_controller.go:131] VM "gce:///OMITTED/shoot--i034796--g1-w1-z1-788d9-hlgnx" for Machine "shoot--i034796--g1-w1-z1-788d9-hlgnx" was terminated succesfully
I1226 10:11:10.681394 86315 machine_util.go:1357] Deleting node "shoot--i034796--g1-w1-z1-788d9-hlgnx" associated with machine "shoot--i034796--g1-w1-z1-788d9-hlgnx"
I1226 10:11:16.055535 86315 machine.go:648] Machine "shoot--i034796--g1-w1-z1-788d9-hlgnx" with providerID "gce:///OMITTED/shoot--i034796--g1-w1-z1-788d9-hlgnx" and nodeName "shoot--i034796--g1-w1-z1-788d9-hlgnx" deleted successfully
How to categorize this issue?
/area robustness /kind bug /priority 2
What happened:
When a
Node
is never associated with itsMachine
. Ie themachine
object never has themachine.Labels[v1alpha1.NodeLabelKey]
set after the machine creation, then during the deletion flow, ourNode
object is not deleted. (Label up-dation can be missed if the machine object update transiently fails)Then after some time, the dangling
Node
object gets theNotManagedByMCM
annotation.What you expected to happen:
Node
object should always be deleted prior to the instance VM Termination andMachine
object deletion, even if the association was missed during instance creation.How to reproduce it (as minimally and precisely as possible):
Machine
and then remove itsnode
label.Node
is still present.Anything else we need to know?:
Environment:
kubectl version
): any