question: avoid removing chart on timeout

daniel-garcia commented 1 year ago

Scenario: I have a CRD in my chart that provisions a Cloud Database instance. Sometimes it can take 10+ minutes before the cloud provider database instance is ready. If there are many concurrent provisioning requests, it can sometimes take much longer. When a helm release timeout occurs, helm removes the resources installed by the chart. This removes the CRD that triggered the resource being created.

Question: Is there a way to prevent the controller from removing existing resources between reconciliation attempts?

kingdonb commented 1 year ago

Yes, there is!

Please see the section on configuring failure remediation here in the docs:

https://fluxcd.io/flux/components/helm/helmreleases/#configuring-failure-remediation

You may have configured retries, whether or not you should also consider setting the remediation strategy details as explained in the doc.

See remediateLastFailure: false which would prevent the failed resources from being rolled back. Then, the expectation is that someone who monitors the cluster for failing states on Flux resources will be alerted through the monitoring solution, and shortly after they will remediate the failure manually somehow, by correcting whatever failed, (or reverting to the last successful chart version in case it's a problem with the chart.)

Hope this helps!

gaalw commented 11 months ago

Hi!

The only robust way I found to install CRD is to put them into different helm chart (like linkerd did) or to put them into separate CRD kustomization which will be installed the first. Otherwise you will run many times in cyclic dependency issue - as I did. I can find better examples on GitHub but not now.

Some examples of such approach (when crds are installed in the separate kustomization):

https://github.com/artazar/flux2-general/blob/main/infra/crds/kustomization.yaml

fluxcd / helm-controller

question: avoid removing chart on timeout #592