Open jdgeisler opened 4 months ago
hey, this is probably because of https://github.com/fluxcd/flagger/pull/1638. The upgrade to k8s 1.30 libs means that the sidecars are now used via the .initContainers[].restartPolicy
. Thus, when Flagger restarts, it reads this field and produces a different checksum. Its an unavoidable side effect that's not triggered due to changes in Flagger itself, but because of upstream k8s libs. The release notes for the next version shall include a warning for the same.
cc: @stefanprodan
Hey @aryan9600, thank you for the follow-up.
We have hundreds of workloads using Flagger, so when we upgrade this would cause all of the canaries to be triggered which is not ideal. Is there any way this can be avoided in the hash calculation of Flagger so that a dependency upgrade doesn't trigger a false rollout?
i can't think of any clean way to avoid this.
You could pause all canaries with .spec.suspend
and enable them one by one when there is a change to the actual deployment. If you use Flux and all things are in Git, the commit that bumps the app version could also remove the suspend field.
Hi @stefanprodan @aryan9600
Is there nothing that can be done around this issue?
I've got the exact same problem with 100's of canaries that can't all be started at once as they burst a rate limit for an external metrics provider used for the analysis.
I've tried setting suspend: true
on a canary and we're using ArgoCD which doesn't see the suspend field as something it needs to remove when resyncing.
The only option we have on our hands at the moment is to release the canaries in batches following an upgrade which would be a very time consuming process regularly as we tend to take the very latest release when it becomes available.
I'm not sure if there are other obvious reasons, but we've experienced this through plenty of other upgrades in the past. Maybe they also contained similar updates like 1.30, but I wanted to provide that data point if it is helpful.
Is there anything that could be done on the controller side to perform something similar to https://github.com/fluxcd/flagger/issues/1673#issuecomment-2232900287 ??
Describe the bug
I've been working on an MR for issue https://github.com/fluxcd/flagger/issues/1646 and ran into this following bug when testing Flagger in my personal kubernetes cluster. I've also reached out in the slack channel here
When building my own docker image from main (commit id #133fdec), I am seeing canary rollouts triggered even though there were no changes to my canary deployment spec. As soon as Flagger was upgraded to this image, the canaries detected a new revision and began the analysis.
To confirm, I also compared 1:1 the deployment spec and nothing changed. This should mean the calculated hash is the same, but for some reason the
lastAppliedSpec
hash in the canary was different.For a sanity check, I also built a custom image from the last tag v1.37, and confirmed the canary analysis is not triggered when upgraded. I also confirmed that the hash remains the same.
To Reproduce
Expected behavior
It is expected that upgrading Flagger does not cause canary rollouts to be triggered if nothing changes in the canary deployments.
Additional context