giantswarm / roadmap

Giant Swarm Product Roadmap
https://github.com/orgs/giantswarm/projects/273
Apache License 2.0
3 stars 0 forks source link

Some elements are not deleted after gitops prune deletes cluster app #3603

Closed carillan81 closed 1 month ago

carillan81 commented 3 months ago

To reproduce the issue:

I0729 09:29:33.606915       1 controller.go:513] "Reconcile KubeadmControlPlane deletion" controller="kubeadmcontrolplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="KubeadmControlPlane" KubeadmControlPlane="org-presales-demo/presales-demo-gitops" namespace="org-presales-demo" name="presales-demo-gitops" reconcileID="baa03b9d-ae79-4c22-ae38-afe20c787159" Cluster="org-presales-demo/presales-demo-gitops"
I0729 09:29:33.606986       1 controller.go:524] "failed to reconcile conditions" controller="kubeadmcontrolplane" controllerGroup="controlplane.cluster.x-k8s.io" controllerKind="KubeadmControlPlane" KubeadmControlPlane="org-presales-demo/presales-demo-gitops" namespace="org-presales-demo" name="presales-demo-gitops" reconcileID="baa03b9d-ae79-4c22-ae38-afe20c787159" Cluster="org-presales-demo/presales-demo-gitops" error="cannot get remote client to workload cluster: org-presales-demo/presales-demo-gitops: failed to create cluster accessor: error fetching REST client config for remote cluster \"org-presales-demo/presales-demo-gitops\": failed to retrieve kubeconfig secret for Cluster org-presales-demo/presales-demo-gitops: Secret \"presales-demo-gitops-kubeconfig\" not found" 

example being deployed in golem from: https://github.com/giantswarm/presales-demo-gitops/tree/main

fiunchinho commented 3 months ago

It looks like the old bug we had before adding the keep helm annotation to the CRs, where the KubeadmControlPlane is deleted too early, before the CAPA deletion logic kicks in. This makes that the kubeconfig secret is deleted, blocking the rest of the CAPA cleanup.

carillan81 commented 3 months ago

command used to deploy kustomization: k apply -f management-clusters/golem/golem.yaml Sops secrets need to be created first.

carillan81 commented 3 months ago

After some tests these are the findings:

T-Kukawka commented 3 months ago

👋 this seems like issue for @giantswarm/team-honeybadger - seems deleting the kustomization with organization break WC clusters deletion within the organization, where the kubeconfig secret is deleted out of order causing issues.

LutzLange commented 1 month ago

I'm hitting this issues as well.

I do need a way to reliably delete clusters. @giantswarm/team-honeybadger

ljakimczuk commented 1 month ago

The question of a reliable way of deleting cluster has already been sort of answered by you, I think. Basically, to delete a cluster, the deletion operation should be about the cluster itself, so please follow either the 1st or the 2nd scenario listed by @carillan81 here. When you do bulk resources deletion, including the namespace these resources reside in, you do not experience any special behaviour of Flux, like more aggressive cleanup, but rather a standard Kubernetes routine. You tell Kubernetes to delete a namespace, so it deletes resources inside, including the kubeconfig Secret the CAPI controllers rely upon for their operations. You would get exactly the same result with any tool, including the kubectl, when performing bulk deletion. When you do not want Kubernetes to immediately remove something, you do it with finalizers, hence if there is any problem at all here, it is a missing finalizer in the kubeconfig Secret.