fluxcd / helm-controller

The GitOps Toolkit Helm reconciler, for declarative Helming
https://fluxcd.io
Apache License 2.0
413 stars 164 forks source link

An update to a new chart version should reconcile the HelmRepository before trying to update the HelmRelease #543

Open onedr0p opened 2 years ago

onedr0p commented 2 years ago

There seems to be a bit of a problem when updating to a new release of a chart in a HelmRelease and if the HelmRepository did not update in time. Eventually it would reconcile and the new chart version would be applied to the HelmRelease but it would be nice if this was taken care of for us.

In other words, if you create a HelmRepository with an interval set to 12h and you want to update your HelmRelease to a new version released within that time frame it will not be able to until the HelmRepository gets updated. It would be nice if Flux/Helm Controller could automatically try to reconcile the HelmRepository anytime a new version of the chart is detected in the HelmRelease.

kingdonb commented 2 years ago

If I understand correctly, you are updating the spec.chart.version in your HelmRelease by hand, but HelmChart hasn't been detected yet, so you wind up waiting, or need to reconcile the HelmRepository by hand.

I think there is little chance of this change, based on my limited understanding of how subscriptions work in Kubebuilder and Flux, I'll explain from my understanding how it works to clarify. Examples of resources that subscribe to other resources in the GitOps Toolkit: Kustomization, HelmRelease, (they subscribe through SourceRef to their sources) ... some others? This is the main action of Flux, "applier" types subscribe to "source" types and they reconcile themselves when upstreams change.

Pretty much any resource which has a ref to another resource on which it depends, can subscribe to get updates about that resource when it changes.

A Kustomization will reconcile immediately when its source is updated and ready, because of the subscription. These are one-way associations, which (the one-way-ness) is a precautionary approach when building notifier patterns, I believe, that avoids producing an infinite loop or notification storm. Anyway, it should be clear from understanding that idea, that when a HelmRelease is subscribed to the HelmRepository, the HelmRepository cannot also be subscribed to the HelmRelease, it would be a loop. If someone understands better than I do, welcome to chime in and correct me if I'm subtly or totally wrong.

But as for what to do now, there are two solutions to handle this, basically: set a lower interval on the source, and whenever the source is reconciled and updated to a new generation that becomes ready (when the chart is ready), the HelmRelease will follow immediately because of its subscription to the HelmRepository, as it already subscribes to its upstream source.

Or, set up a Notification Webhook Receiver so the HelmRepository can be reconciled immediately when the event arrives, this solves the problem without setting a shorter interval anywhere.

https://fluxcd.io/flux/guides/webhook-receivers/

Classic HelmRepository resources are slow and bulky to operate compared to the newer OCI type HelmRepository, which doesn't depend on an index.yaml – it might be better for performance if you can use this, if you do wind up setting a tight interval instead of the webhook approach.

kingdonb commented 2 years ago

We are talking about this today in bug scrub, and the subtext here is that through optimization, HelmRepository could be made about 90% more efficient to where we would be comfortable recommending a shorter interval for HelmRepository

But right now, unless you are using HelmRepository with type: oci the operation is very inefficient and can be a major performance bottleneck for Flux, due to the limitations of index.yaml in the Helm repository design. Can you tell more about how your pipeline works if none of these answers are for you? How are you triggering the update to HelmRelease?

onedr0p commented 2 years ago

My workflow consists of using renovate to open PRs for helm charts I do not write or maintain, e.g. cert-manager. I have the interval set on the helm repository to an hour on these. Usually it's not an issue as I'm not merging these PRs right away but sometimes this does happen.

I'm guessing there will be a time when more people are packaging their charts as OCI artifacts but who knows how long that will be or if it's even done at all.

kingdonb commented 2 years ago

That helps to clarify, the use case makes sense now, and why you're running into this issue.

So when Renovate detects a change in HelmRepository, it opens you a PR with the upgrade (I was guessing this was some CI automation, Renovate makes sense here), and if you merge the PR before Flux gets around to reconciling the HelmRepository, you get a temporary error state, and it resolves itself... eventually...

We are talking about how to kick-start adoption of OCI, and there are two things we need to see first:

If there are other common chart publishing tools in the wild, we'd better add them to the list, but once helm/chart-releaser has added the support, and common issues are addressed, we should hopefully start to see this pick up.

Meanwhile, this issue shall remain open until we can find a better answer for this. To trigger HelmRepository to reconcile early seems like a no-brainer, but the performance impact should not be understated, as an error in the YAML could result in a continuous reconcile loop that puts Helm Controller or Source Controller on the top of the Prometheus charts soaking memory usage until somebody notices the error and fixes it in the source.

onedr0p commented 2 years ago

So when Renovate detects a change in HelmRepository, it opens you a PR with the upgrade (I was guessing this was some CI automation, Renovate makes sense here), and if you merge the PR before Flux gets around to reconciling the HelmRepository, you get a temporary error state, and it resolves itself... eventually...

Correct, ideally I would like the interval to be 24hr+ because there's no reason to pull so often. It just consumes bandwidth on both ends and what you mentioned seems affects Flux performance.

We are talking about how to kick-start adoption of OCI, and there are two things we need to see first:

I am 100% on the OCI train for helm charts I would write or maintain but it will be a long time before others seem to jump on board.

Thanks for keeping the issue open, I am looking forward to seeing what can be done to improve pulling helm repo updates.

onedr0p commented 2 years ago

I discovered OCI helm charts do not work with Renovate for Flux, for which I have opened this issue over there.

https://github.com/renovatebot/renovate/issues/18509

onedr0p commented 1 year ago

In the meantime if this never comes into fruition I set up a self-hosted github runners and wrote this workflow to run a flux reconcile when a label is added to a PR. The label is added automatically from this workflow.