Closed simingweng closed 11 months ago
Here is the comparison between the two generations of the same resource from Kubernetes API server point of https://www.diffchecker.com/krOGkCeF, it seems the unnecessary reconciliation is caused by the interval
field.
The interval
field is of type metav1.Duration which embeds time.Duration. When an interval is specified as 5m
in the YAML, it's unmarshaled then marshaled as 5m0s
. Therefore, when the same YAML is applied again, it causes a spec change from 5m0s
to 5m
, resulting a new generation of the resource and a reconciliation.
Interestingly, once the same YAML is applied for the second time, the desired 5m
string representation is preserved, therefore any further re-applying no longer produces new generation of resources. I guess it's due to the difference between Create
and Update
operation of the resource, but I'm not sure.
Having found the above, the workaround to avoid such an unnecessary helm upgrade
action is to always include the seconds
field in interval
as well as any field of type metav1.Duration
. It is important in our use case because we need to reliably know whether an upgrade has happened or not.
@simingweng I don't believe that 5m0s to 5m is the cause of a trouble here.
Not directly related to your issue, but this is hot and it's coming out in the next release of Flux, have you seen this yet:
$ flux diff ks 75-deis --path ./apps/hephy/
► HelmRelease/deis/hephy drifted
metadata.generation
± value change
- 7
+ 8
spec.interval
± value change
- 15m0s
+ 15m
This was just merged and so I thought to use it to test what you described, as it should be in the next Flux release; you can see that 15m0s and 15m do get applied as different values in the cluster with Server Side Apply. So one can infer, I think, that setting a new spec.interval
also has the effect of reconciling the HelmRelease for an upgrade.
I was able to replicate something like the behavior that you described, although I'm not sure I can provide much in the way of analysis of your findings, I can tell you that the Kustomize controller behavior is about to see some big changes in Flux 0.26 and you very well might see a positive effect difference in the behavior there when it is released, likely soon. (Watch fluxcd/flux2#2308 for more details.)
Helm Controller has some quirks around timeout and helm --wait
behavior due to the unpredictable behavior of clusters at runtime that results occasionally in a helmrelease that has failed and/or is stuck. In order to get the HelmRelease un-stuck, we generally prescribe making an update to the HelmRelease in the source and it triggers another reconcile/upgrade action.
I always thought that users had to make a change to the inputs that Helm considers in order to trigger the upgrade, so: values, chart (template, defaults), or secrets or configmaps listed in valuesFrom
. Turns out Helm Controller isn't quite that picky, and any update to the HelmRelease itself will do. (That's helpful information, so thanks very much for pointing it out!) I had been telling folks that they can add random values to spec.values
as a nonce, but that's a little bit untoward, I'd much rather edit something which can be clearly shown to have no effect on the outcome, like spec.interval
.
So you understand, Helm Controller is not continuously upgrading the Helm Release in pure "continuous reconciling" fashion as one might anticipate according to the definition of GitOps, due to the difficult combination of behavior of Helm's lifecycle hooks, secret generators, and 3-way merging patch-update.
Anyway this is a long story, but YSK that users cannot generally trigger HelmRelease to reconcile an upgrade when there are no changes to apply by simply running flux reconcile helmrelease <theHelmRel>
because Helm Controller will only do an upgrade when there is an input that has changed. Helm Controller cannot detect drift due to another long story (#186) so at each spec.interval
it really only does checking for changes in inputs that should trigger an upgrade.
So, it may actually be desired behavior that changing interval reconciles an upgrade even though it clearly does not need to; although I can't say for sure that things will change one way or another, look for flux diff
and other advancements in 0.26 to improve the overall visibility in and around the areas that this question targets. 👍
@kingdonb Thanks for looking into it, but I'm not sure I'm 100% following you when you said it would actually be "desired behavior". I completely understood and agreed that, if there is indeed a change to the interval
field, it should trigger a reconciliation because the "spec" has changed. However, in my case, I never made any change to my "HelmRelease" YAML file, the "interval" field in the file is always "5m".
The problem here is that when I kubectl apply
such a YAML for the first time, the persisted value of the "interval" field of this "HelmRelease" resource in the API server has been mutated to "5m0s" due to the way "time.Duration" Go structure is marshaled. Now the server side has an "interval" of string "05m0s", and my YAML still has "interval" of string "5m".
Now if I apply my YAML one more time, I would assume nothing would have happened, since nothing has changed in my YAML at all. However, because of the above, the server thinks I'm changing "5m0s" to "5m", which is never intended from client point of view.
This is not helm-controller's fault by any means, and actually I don't think there's anything helm-controller can do here to mitigate. (or maybe sanitize the "interval" value to carry seconds always, in a mutating webhook?)
I just want to raise my findings here so that it no longer surprises other developer when they see the unnecessary and silent helm upgrade
action when they apply the same "HelmRelease" resource twice in a row.
@simingweng I'm not sure I understood all of the details of your report, but you might be describing one of the kustomize-controller issues with strange behavior from the new Server-Side Apply, issues that are in process to (🤞) fix for Flux 0.26.
Your reply makes total sense to me; the behavior we described of helm-controller is desired, the Kustomize controller is not. Flux should reconcile a change only once, and it should be able to tell if two documents resolve to the same cluster state so that a notification is not raised in a loop, and the cluster is not "configured" repeatedly when there is no change. (We are agreed on all that.)
I would check if you see improvements from the next release of Flux, but it might also be worth opening a separate issue in Kustomize Controller. I'm just not sure if it would be a duplicate report over there, I haven't seen one similar, but that doesn't mean it's not the same issue as others we have observed. (Thanks again for your report!)
Any update on this?
@devopstales Are you reproducing this issue on a current version of Flux? I don't think it's desired behavior, but is this causing you some issue? (Can you say a little bit more? This is an 18 month old issue, I doubt anyone is following it up now, but if you'll let us know about your workflow and how this is problematic, what version of Flux you are currently seeing an issue with, we can tell if it's something for Flux devs to pursue or not...)
Here are the steps to reproduce the my observation.
kind create cluster
flux install
kubectl apply -f https://github.com/fluxcd/helm-controller/raw/main/config/samples/source_v1beta1_helmrepository.yaml
https://github.com/fluxcd/helm-controller/raw/main/config/samples/helm_v2beta1_helmrelease_helmrepository.yaml
ready
, the status showslastReleaseRevision: 1
kubectl apply -f https://github.com/fluxcd/helm-controller/raw/main/config/samples/helm_v2beta1_helmrelease_helmrepository.yaml
helm upgrade
is performed, and results inlastReleaseRevision: 2
Is this behavior by design? Although the
helm upgrade
action doesn't modify any meaningful state in the cluster, does it make sense to advance the helm revision in such a scenario?