Closed carillan81 closed 1 month ago
It looks like the old bug we had before adding the keep
helm annotation to the CRs, where the KubeadmControlPlane
is deleted too early, before the CAPA deletion logic kicks in. This makes that the kubeconfig
secret is deleted, blocking the rest of the CAPA cleanup.
command used to deploy kustomization:
k apply -f management-clusters/golem/golem.yaml
Sops secrets need to be created first.
After some tests these are the findings:
prune: true
(automatic deletion) and with prune: false
and manually deleting the cluster app.prune: false
and then manually delete the cluster app, the cluster is properly deleted.prune: true
flux does an aggresive deletion (probably deleting other components including the organization at the same time that the cluster). In this case the deletion fails and some elements are stuck and need to be removed editing finalizers.👋 this seems like issue for @giantswarm/team-honeybadger - seems deleting the kustomization with organization break WC clusters deletion within the organization, where the kubeconfig secret is deleted out of order causing issues.
I'm hitting this issues as well.
I do need a way to reliably delete clusters. @giantswarm/team-honeybadger
The question of a reliable way of deleting cluster has already been sort of answered by you, I think. Basically, to delete a cluster, the deletion operation should be about the cluster itself, so please follow either the 1st or the 2nd scenario listed by @carillan81 here. When you do bulk resources deletion, including the namespace these resources reside in, you do not experience any special behaviour of Flux, like more aggressive cleanup, but rather a standard Kubernetes routine. You tell Kubernetes to delete a namespace, so it deletes resources inside, including the kubeconfig Secret the CAPI controllers rely upon for their operations. You would get exactly the same result with any tool, including the kubectl
, when performing bulk deletion. When you do not want Kubernetes to immediately remove something, you do it with finalizers, hence if there is any problem at all here, it is a missing finalizer in the kubeconfig Secret.
To reproduce the issue:
prune: true
should delete the org, the apps and the wcIt looks like the kustomization is triggering deletion in a specific order that is making some of the elements fail to delete. From the
kubeadm-control-plane-controller-manager
log:example being deployed in golem from: https://github.com/giantswarm/presales-demo-gitops/tree/main