Open ZF-fredericvanlinthoudt opened 8 months ago
We also experienced this and since we have Argo CD installed via helm we had fun trying to rollback 😅
On our production ArgoCD, with 1000+ applications, after updating to v2.10.0, the sync and refresh buttons completely freeze the UI. We noticed that the application controller used twice as much memory and cpu but also we didn't found any relevant logs. We had to rollback to v2.9.5.
The sharding is not working in 2.10.0 as it was working in previous versions. If you try to remove env variable ARGOCD_CONTROLLER_REPLICAS and restart controller
You will see sync and refresh will start working again
We experience the same sync loop issue with version 2.10.5.
Anyone found a solution for this? Is it an option to add the 2 finalizers to the Application in git? Or would that break an initial deploy?
Fixed by #18003 ?
Hello, We started also seeing several Applications on ArgoCD being out of sync constantly with those 2 finalizers as diff. This started to happen after upgrading from version 2.9.6 to v2.11.0. After reverting v2.9.6 everything went back to normal. After the upgrade to v2.11.0 we started seeing every metrics going up (memory usage, CPU usage and also the queue times that were zero). The upgrade occurred around 9AM today May 20th
After installing v2.9.6 everything went back to normal again, please ignore the gap between ~17:35 and ~18:00 we had an issue with the metrics collections.
It can be clearly seen that there is a spike in every metric of the application controller (CPU, RAM kubernetes executions) and a drop after reverting to v2.9.6. We could see an immediate increase in the queue time that remains at zero after reverting the version.
At the moment we have only these metrics for v2.9.6 and v2.11.0. For some reason with other versions our metrics agent is not being able to gather any information, will check what can be done and test with other different versions to see if this issue with the finalisers persists.
Thanks!
UPDATE Hello, Just to add more information, regarding the issue. It seems that v2.9.15 works as v2.9.6, trying out v2.10.10 caused the issues mentioned above so it must be something introduced in v2.10.x. As this version is installed we start seeing the queue increasing and the apps starting a sync loop. Thanks for the support
I'm on v2.11.2+25f7504
version and experience the same problems.
I'm stuck on infinite loops if selfHeal
is on.
I've installed the version below and I am facing the same issue: { "Version": "v2.11.3+3f344d5", "BuildDate": "2024-06-06T08:42:00Z", "GitCommit": "3f344d54a4e0bbbb4313e1c19cfe1e544b162598", "GitTreeState": "clean", "GoVersion": "go1.21.9", "Compiler": "gc", "Platform": "linux/amd64", "KustomizeVersion": "v5.2.1 2023-10-19T20:13:51Z", "HelmVersion": "v3.14.4+g81c902a", "KubectlVersion": "v0.26.11", "JsonnetVersion": "v0.20.0" }
got the same with nvidia gpu operator and self heal disabled don't change anything
The same is still happening in the latest version v2.12.4:
We are also experiencing this, is there a workaround for that?
We're experiencing the same issue with the Falcon sensor, as mentioned in the previous comment. Could you please advise?
Got also the same issue. Any tips on how to circumvent it?
Hey, i found a possible mitigation in Issue-17433
This ticket is probably a duplicate to this ticket.
TLDR; Just add the following to the argocd-cm
to ignore differences in Argocd Applications source comment
resource.customizations.ignoreDifferences.argoproj.io_Application: |
jqPathExpressions:
- .metadata.finalizers[]? | select(. == "post-delete-finalizer.argocd.argoproj.io" or . == "post-delete-finalizer.argocd.argoproj.io/cleanup")
- if (.metadata.finalizers | length) == 0 then .metadata.finalizers else empty end
Hello, indeed the mentioned snippet stops the post-delete hooks to be considered as a diff.
After enabling this setting the resource usage of the controller is not as high as mentioned before.
But the queue still increases:
We are using the ArgoCD datadog integration, so these metrics are directly reported by the ArgoCD pods. One metric that we can see that increased alot and might be related are these ones:
They seem to be related to the Repository server now. Could this be also related to the queue increasing?
This might be also another issue not related to the post delete hook, but is just happening after upgrading to a release > 2.10.x.
In this release the server-side diff feature was added, but as I know it is disabled by default on the configmap and enabling it with controller.diff.server.side
documentation.
I'll post here if I can find anything else new
Checklist:
argocd version
.Describe the bug
Since we've updated to ArgoCD v2.10.0, we are facing a constant refresh/sync issue with Applications that have a Helm template as source and are using "post-delete" hooks in Helm. Probably this is related to the new feature that added support for post-delete hooks. The application diff (see screenshot below) shows that it wants to two post-delete-finalizer.argocd.argoproj.io finalizers from the Application. This change gets synced but almost instantaneously it gets out-of-sync again with the same diff and repeats the same process over and over again. On our production ArgoCD instance, with more than 1200 applications, this causes ArgoCD to freeze and not sync any other applications anymore (those other application's sync are just stuck in "waiting to start").
To Reproduce
https://REDACTED.git
is a placeholder for a GIT repository that contains directories with ApplicationsExpected behavior
Applications that use post-delete Helm hooks should be synced successfully in one go and should not constantly be synced over and over again when auto-sync is enabled.
Screenshots
Version
Logs
No relevant logs found.