fluxcd / flux2

Open and extensible continuous delivery solution for Kubernetes. Powered by GitOps Toolkit.
https://fluxcd.io
Apache License 2.0
6.17k stars 574 forks source link

flux doesn't uninstall a HelmRelease if the containing namespace is deleted #4793

Open gprossliner opened 2 months ago

gprossliner commented 2 months ago

Describe the bug

A HelmRelease object is created in a namespace. This causes flux to reconcile the object, which will install the configured HelmChart. If you delete the containing namespace, flux doesn't seem to proper uninstall the HelmRelease. All namespaced objects are gone, because the namespace is gone. But global resources (a ValidatingWebhookConfiguration in my case) are left behind. So I assume the uninstall never happend.

Steps to reproduce

If requested I can make a detailed repo.

Expected behavior

The uninstall should be performed, allowing helm to delete all resources and run other uninstallation jobs.

Screenshots and recordings

No response

OS / Distro

23.5.0 Darwin Kernel Version 23.5.0

Flux version

flux: v2.2.3

Flux check

► checking prerequisites ✗ flux 2.2.3 <2.3.0 (new CLI version is available, please upgrade) ✗ Kubernetes version v1.24.17 does not match >=1.26.0-0 ► checking version in cluster ✔ distribution: flux-v2.2.2 ✔ bootstrapped: false ► checking controllers ✔ helm-controller: deployment ready ► ghcr.io/fluxcd/helm-controller:v0.37.2 ✔ kustomize-controller: deployment ready ► ghcr.io/fluxcd/kustomize-controller:v1.2.1 ✔ notification-controller: deployment ready ► ghcr.io/fluxcd/notification-controller:v1.2.3 ✔ source-controller: deployment ready ► ghcr.io/fluxcd/source-controller:v1.2.3 ► checking crds ✔ alerts.notification.toolkit.fluxcd.io/v1beta3 ✔ buckets.source.toolkit.fluxcd.io/v1beta2 ✔ gitrepositories.source.toolkit.fluxcd.io/v1 ✔ helmcharts.source.toolkit.fluxcd.io/v1beta2 ✔ helmreleases.helm.toolkit.fluxcd.io/v2beta2 ✔ helmrepositories.source.toolkit.fluxcd.io/v1beta2 ✔ kustomizations.kustomize.toolkit.fluxcd.io/v1 ✔ ocirepositories.source.toolkit.fluxcd.io/v1beta2 ✔ providers.notification.toolkit.fluxcd.io/v1beta3 ✔ receivers.notification.toolkit.fluxcd.io/v1 ✗ check failed

Git provider

No response

Container Registry provider

No response

Additional context

No response

Code of Conduct

gprossliner commented 2 months ago

My personal thoughts on this:

gprossliner commented 2 months ago

A potential PR could be implemented like:

Note: Whenever there is HelmRelease mentioned in this comment, it is not a flux HelmRelease object, but a HelmRelease object from helm.sh/helm/v3/pkg/release. So there will be multiple HelmReleases for a single flux HelmRelease, whenever an Upgrade action is performed.

Questions for the maintainers:

souleb commented 2 months ago

If we add a finalizer to the release secret, people won't be able to use the helm cli anymore to do such operation, and we won't do that. Flux should be able to progress from any action done with helm cli and vice versa.

There are also many reasons why one would want to delete a secret without triggering an uninstall action. Which will no longer be possible with your proposal.

The proper way to uninstall a helm release installed by Flux, is through the Flux HelmRelease which has a finalizer, not by manually deleting underlying k8s objects.

Why don't handle the namespace lifecycle with Flux? Within the HelmRelease or the Kustomization that installs it?

gprossliner commented 2 months ago

Thank you for you response @souleb!

If we add a finalizer to the release secret, people won't be able to use the helm cli anymore to do such operation, and we won't do that. Flux should be able to progress from any action done with helm cli and vice versa.

Understood. What actions are to be expected to be done with the helm cli? Wouldn't flux basically undo those operations on the next reconciliation?

For example if I uninstall a release with the helm cli, flux would find the release not installed and install if afterwards? If I change values for a release, basically the same should happen because flux has authority over the values used?

One was to mitigate this problem could be to watch for the HelmRelease secrets for deletion, and remove the finalizer if the namespace / HelmRelease is not in finalization. So any manual operation should be unaffected.

There are also many reasons why one would want to delete a secret without triggering an uninstall action. Which will no longer be possible with your proposal.

I didn't meant to say that. Flux should only, ever, uninstall the Release on the finalizer of the flux HelmRelease object. The finalizer on the Secret only ensures that it is still there when the uninstall happended because of that. I personally also needed to delete those Secrets to recover from errors / compatibility problems.

The proper way to uninstall a helm release installed by Flux, is through the Flux HelmRelease which has a finalizer, not by manually deleting underlying k8s objects.

Yes, this would not change.

Why don't handle the namespace lifecycle with Flux? Within the HelmRelease or the Kustomization that installs it?

For sure this would be a way to go. But not all clusters are 100% managed with flux. And there is a real possible problem here. If things are "left behind", like ClusterRoles / Bindings that may only be a problem if a new installation will take place.

But we had a real operational issue, rendering this cluster in a degraded state:

souleb commented 2 months ago

Deleting a namespace without cleaning in it first is generally a bad idea:

Also once a deletion start, there is no going back. So by adding the finalizer, it means that the controller has somehow to perform an action if the secret is deleted. In most cases this will be a no-op because next reconciliation will just upgrade the release and recreate the secret.

This seems complicated to me for the benefits.

A better solution is being discussed in https://github.com/kubernetes/enhancements/pull/2840/files.