Closed cdenneen closed 7 months ago
@stefanprodan this feels like an upstream flux bug potentially.
Flux uninstall can't remove workloads from a cluster, this is impossible.
@stefanprodan that wasn't the ask. The ask here is that the TF module not follow the uninstall method. The point of "destroy" in TF is to remove so I can't see where you'd use this module and want to retain deployments. It's normally coupled in a module with other resources like repositories, keys, k8s clusters that are being destroyed. So in this case you run into issues destroying a cluster that still has resources running on it. For example when EKS cluster is built and sending logs to a CloudWatch Log group and you are in the process of destroying everything there are cases that the LogGroup gets destroyed and due to "workloads" still running on cluster it is recreated with wrong permissions for retention for example and causes blocks if you try to create new stack with same name now resource already exists. So the ask here was to have an opt-in on this module that upon destroy it runs some hook or something to delete the CR resources created by CRDs instead of just the CRD
We can't delete CRs as there is no wait mechanism in Flux for that, also lots of things can't be deleted, for example Helm SDK leaves PVC and CRDs behind. Given this, flux uninstall
and this provider will never remove workloads, just Flux own workloads.
Sorry I was just thinking you delete the CR like flux delete hr foo wouldn't that delete the workload? Does flux delete ks delete the HR under it? It can all be worked around.
Currently the behavior of the
destroy
is to perform similar function of aflux uninstall
. While this makes sense to retain workloads when doing aflux uninstall
in the case of aterraform destroy
you aren't retaining workloads and are therefore causing dangling resources.For example if you bootstrap a cluster and flux then provisions things like cert-manager, kube-prometheus-stack, prometheus-adapter.
After destroy of
flux_bootstrap_git
the dangling resources fromHelmRelease
and such by not being installed cause problems with the rest of the destruction of the things likekubernetes_namespace
destroy.and
Therefore the behavior of
flux_bootstrap_git
shouldn't follow theuninstall
logic of removing it's finalizers from it's resources but to actually remove all the resources flux was managing (HelmRepository
,HelmRelease
,GitRepository
,Kustomization
, etc). I understand this is a change but I can't see how the currentdestroy
behavior for this resource actually is useful in a production capacity when tearing down a cluster in which had workloads that were managed by flux and are now "abandoned" more or less.If changing the behavior from
flux uninstall
logic seems too severe then maybe there should be a switch/flag that would allow for this resource removal as opt-in vs changing the default behavior. I guess there could be scenarios where you are destroying justflux_bootstrap_git
and it's not part of a larger cluster in the same setup so you "might" want to retain those workloads. Just seems however in most cases you are bootstrapping a cluster at build time so when your now destroying thecluster
and theflux_bootstrap_git
, there is no logical reason the "retain workloads" like you normally would with aflux uninstall
(where maybe you decided not to continue to use flux).Discussion here for context: https://cloud-native.slack.com/archives/CLAJ40HV3/p1698355353067829