Open FredrikAugust opened 5 months ago
I just saw this issue over at argo-rollouts
https://github.com/argoproj/argo-rollouts/issues/3281
Reading
In some scenarios rollouts-controller receives ownership of spec.template.spec.containers field and blocks other components from updating the rollout with server-side apply
We're using Argo Rollouts, and it appears that this might be the case. I see this block under managedfields
- apiVersion: argoproj.io/v1alpha1
fieldsType: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.: {}
f:rollout.argoproj.io/revision: {}
f:spec:
f:template:
f:spec:
f:containers: {}
manager: rollouts-controller
operation: Update
time: "2024-01-10T09:26:56Z"
I've uploaded the entire rollout to help debug: https://gist.github.com/FredrikAugust/fe1bafb02e78f0cd8d309c460679fe85
I just set up an entire new cluster from zero using
argo cd : v2.9.5+f943664
argo rollouts : v1.6.4+a312af9
And the issue is still persisting so it shouldn't have anything to do with stale state or the like.
Are there anyone who could assist me in debugging this or point me to where the problem might be? It's still happening on all our clusters.
We're seeing this exact same issue on new clusters as well....
By chance - who here is using ArgoCD in a remote model vs local-in-cluster deploy?
in cluster here
@FredrikAugust Interesting... we're seeing the problem primarily on remote-cluster setups... we have been migrating to a remote-cluster model over the last few weeks, and that's when we saw this creep up.
After syncing twice, the live tab of the Rollout shows the correct values. In case that is of help.
The only "fix" we have right now is to turn on selfHeal
- which at least just resyncs as soon as Rollouts wipes out the field.. but I really hope we can get some traction on this issue at some point. :/
The only "fix" we have right now is to turn on
selfHeal
- which at least just resyncs as soon as Rollouts wipes out the field.. but I really hope we can get some traction on this issue at some point. :/
Do you know if this is a problem caused by Argo Rollouts?
We're hitting this pretty consistently now.
Remote argo, kustomize image changes. Argo helm version 7.1.1 Argo workflow helm version 2.35.3
Actually, this is kinda a bit nastier than just an out of sync.
I just pushed a manifest adding a requests block to the resources of my container. But the diff was stuck in this state
The real diff is
@@ -99,8 +98,10 @@
timeoutSeconds: 1
resources:
limits:
- cpu: "2"
+ cpu: 2
memory: 1G
+ requests:
+ cpu: 200m
enableServiceLinks: false
nodeSelector:
kubernetes.io/arch: arm64
but that's hidden behind this bad sync.
Why is the diff ignoring the containers block? Is it from the managed-fields thing? I do have this on my application so that argo and rollouts don't fight during the canary release.
ignoreDifferences:
- group: '*'
kind: '*'
managedFieldsManagers:
- rollouts-controller
Is that too broad?
Checklist:
argocd version
.Describe the bug
We update a
values.yaml
file in a GitHub Action which then is provided to a Helm chart to set theimage
field of aRollout
from Argo Rollouts. We then run async
operation on theApplication
(which stems from anApplicationSet
) which updates the runningReplicaSet
to deploy with the new version.After this sync is done, the application is still reported as
OutOfSync
, and by clicking the diff we can see that Argo CD reports thatspec.containers
should be set tonull
.It reports the live having
containers: null
and the desired not having the keycontainers
at all. However, by clicking the actual Rollout in Argo CD and looking at the Live and Desired manifests, they both are correct, so frankly I can't tell where it got the values from.By running sync once more, it goes away. It will come back after updating
image
once more however.In addition to this, I'm experiencing that ArgoCD is having trouble marking Applications from the same ApplicationSet as OutOfSync when they are to a human eye clearly out of sync.
I've manually updated the values file to have image tag
y
, whereasx
is currently deployed. I.e.live=x
anddesired=y
. I can verify this (again) by looking at the Desired and Live tabs on the Rollout. The diff tab, however, shows no differences. Hard refresh and normal refresh doesn't work.I've tried
flushdb
I'm reporting these seemingly two bugs as one as it appears they share a common problem of a "misaligned state" in lack of better terms.
Clicking "sync" again syncs the application correctly to the actual desired version, but puts us back in the
containers: null
state.Sometimes it also appears to get stuck on "Refreshing", but clicking refresh manually puts it back to in-sync. This goes for seemingly all applications.
To Reproduce
I'm unsure as it only recently started occurring, but we're using a git generator app set to generate a set of pretty common applications. We're using no ignoreDifferences nor RespectIgnoreDifferences (although we used to have this and ApplyOutOfSyncOnly enabled).
Expected behavior
It should correctly show applications in and out of sync, and diff and desired/live manifest should be consistent.
This is currently breaking our CD pipeline so if there's anything I can do to assist let me know.
Version
Logs