fluxcd / flux2-kustomize-helm-example

A GitOps workflow example for multi-env deployments with Flux, Kustomize and Helm.
https://fluxcd.io
Apache License 2.0
991 stars 1.05k forks source link

Question about HelmRelease Automation #31

Closed robincher closed 3 years ago

robincher commented 3 years ago

Hi,

I was just thinking how i can better "sell" flux2 to my team of developers who are mainly using Helm for installing their workloads in the cluster

Note that with version: ">=1.0.0-alpha" we configure Flux to automatically upgrade the HelmRelease to the latest chart version including alpha, beta and pre-releases.

I am curious how would the above actually works? Could it be a automated Pull/Merge request that is created when a container image is updated? Or Flux will automatically pull the latest tags/version based on the interval configured.

Thanks!

kingdonb commented 3 years ago

Thanks for submitting a question. With HelmRelease automation, no commits are made. There are a number of reasons for this, but I think the strongest explanation is that a Helm release upgrade can fail, with configurable retry behavior, and committing a change to git cannot happen in a reasonable way that is consistent with how Flux's image update automation does commit back to the git repository.

Helm update automation doesn't simply bump the image version, it's watching for new chart releases and upgrading the chart. Semantic versioning is best applied here, since some chart changes may require manual intervention to upgrade, and the major version number in a SemVer chart version is meant to be used to signal when that has happened.

This works altogether differently than Image Update Automation, even though both can be driven by semver. When Flux's image automation controller updates a deployment's podspec with a new image, it does this as the first step to applying the change to the cluster. The deployment update can fail later for any number of reasons (image which is deleted, image which fails to start, or crashes before becoming ready), but importantly Flux can apply the change to the cluster and it will not be reverted out from under the Flux kustomize machinery by any other process. We say that Git is the single source of truth.

That deployment update will keep retrying the configuration you provided, usually leaving the old pods until success, or in production an alert is perhaps raised indicating a stuck deployment, that someone can notice and handle as an incident. To undo this, they can disable the update automation and revert the commit, then investigate what went wrong in the latest image and produce a new one before re-enabling the update automation. This process or a similar one is described in incident management.

Helm releases are a bit more complicated. Helm doesn't just apply the template to the cluster and wipe its hands, walking away, it uses health checking by default. It tracks every resource in the release and Helm Controller uses the behavior of the --wait flag to determine if everything in the release finally came out OK (load balancers provisioned, deployments ready, jobs succeeded, etc.)

If the helm created resources fail to become ready, depending on the configuration of the HelmRelease, it may be reverted or uninstalled, or tried again a configurable number of times before "remediating" with a revert; but importantly, with most configurations a failed release will eventually give up and be reverted, Helm Controller knows not to re-attempt it. (To wrap back around to your specific question, what would this process look like if it was driven from Git? Would Flux have to revert the commit that signaled an upgrade, signaling the failure and remediation? How would Flux know not to retry an upgrade?)

This history is reflected in the output of helm history which comes from the release configmaps (actually secrets) that Helm stores its release information in. So even though we are not completely reflecting these change details in Git, it is a process based on declarative artifacts, and the capability to rollback is maintained another way. If a release lands in a failed state, it can be automatically reverted and an alert can be raised so someone will take a look. The upgrade will not be re-attempted until someone intervenes, or until another upgrade is pushed. Helm Controller uses Helm under the hood, so also does all of the same accounting in the same way, too, when making Helm releases from HelmRelease resources.

Hope this explanation helps. In short, Image Update Automation and HelmRelease upgrade automation are totally separate features and they have very different behavior. Here's a pointer to some docs about it: https://fluxcd.io/docs/use-cases/helm/#automatic-release-upgrades

robincher commented 3 years ago

Hi @kingdonb,

I really like the concept of Release should use a fixed MAJOR chart version, but want the latest MINOR or PATCH versions as they become available. , which provide us the flexibility to automate minor/bug fixes release, while creating another guard rail when we attempt to make major changes that could break things.

I guess i will proceed to deploy fluxv2 to better observe the behaviour. I will post a question for specific topics if more clarification is needed.

I appreciate your clear and detailed sharing , and really gave someone like me who just get started with GitOps something to ponder and work on. Many thanks for that!