Open woehrl01 opened 1 year ago
Do you actually use any CMPs, or is all that truly completely wasted CPU?
@crenshaw-dev I use a CMP, but not in that repository.
Gotcha. We could cache the discovery result on a per-commit basis, but my guess is that you're hitting the high CPU use with new commits.
An alternative would be to explicitly set helm
kustomize
or directory
in the spec.source
field. That should force Argo to bypass the CMP detection phase.
No actually, it's the same commit, but I have a mono repo, so it does the resolving 8.000 times for each root folder of the apps.
Great, I'll check the directory part
We encountered the same issue on v2.7.11
. After migrating plugins from ArgoCD-cm to sidecars, CPU and memory usage skyrocketed. Consequently, argocd-repo-server
pods started to get throttled, ArgoCD slowed down and eventually got stuck. Bumping argocd-repo-server
resources requests and limits did not help. Therefore, we had to revert the changes.
As a result, we can not use ArgoCD sidecar plugins and are blocked from updating ArgoCD to v2.8
Resource usage increase after plugins migration to sidecars:
@woehrl01 this might also help mitigate the issue if your monorepo is large due to non-yaml resources: https://argo-cd.readthedocs.io/en/latest/operator-manual/config-management-plugins/#plugin-tar-stream-exclusions
Gotcha. We could cache the discovery result on a per-commit basis, but my guess is that you're hitting the high CPU use with new commits.
An alternative would be to explicitly set
helm
kustomize
ordirectory
in thespec.source
field. That should force Argo to bypass the CMP detection phase.
Could you please elaborate on this solution?
Thanks @crenshaw-dev the repo only consists of yaml files, but I still use it to exclude the .git folder.
I also experience that I have to lower the parallel repo actions from 50 to 5 otherwise I'll end up in a strange deadlock situation. Could be because of the plugin detect, too.
@JuozasVainauskas Argo CD only does plugin "discovery" if you haven't explicitly specified in your App manifest that you want something besides a plugin. For example:
kind: Application
spec:
source:
kustomize:
images: [a=b]
For this app, Argo CD would skip plugin discovery because it automatically knows it'll be using Kustomize instead.
@JuozasVainauskas Argo CD only does plugin "discovery" if you haven't explicitly specified in your App manifest that you want something besides a plugin. For example:
kind: Application spec: source: kustomize: images: [a=b]
For this app, Argo CD would skip plugin discovery because it automatically knows it'll be using Kustomize instead.
Understood, thank you. Unfortunately, this will not help us since we use plugins by name instead of discovery.
@crenshaw-dev I just deployed the fix with the directory across all our clusters. The CPU usage of the repo-server has not changed (but isn't an issue yet), I'll monitor and keep you updated.
Comments from @crenshaw-dev - When you add a CMP, all apps now have to query that CMP to see if it can be handled. This is by design to keep potential issues out of repo server. However, it does create a performance penalty if you add a single CMP for a single app because all apps have to check against that CMP. Will review to see if we should architect differently.
Related proposal: https://github.com/argoproj/argo-cd/issues/15006
Another suggested stop-gap: Support a feature flag to disable discovery.
@alexmt suggests keeping it disabled by default.
We managed to keep CPU usage under control by setting --parallelismlimit
flag. However, after argocd-cm plugins migration to sidecars, CPU usage still increased significantly and ArgoCD got slower. As a result, we can not migrate our argocd-cm plugins to sidecars and upgrade ArgoCD instances to 2.8.x
Update: we have successfully solved the performance issue by setting --plugin-tar-exclude
value to .git/*
and migrated argocd-cm plugins to sidecars.
Potentially unrelated I had wondered if it might not be easier/better if we could configure the plugin-tar as inclusive per plugin rather than globally and an exclusion list. At least in large monorepos it's much easier to decide what I want to send to the CMP rather than trying to exclude.
@crenshaw-dev
We just did a redeploy of about 6.000 apps today, with the fix of assigning the directory and bypassing the plugin detection, we have now received a really awesome deployment time of about 20 minutes. CPU usage of the repo servers is also great!
CPU usage of repo server:
Possible optimization points to further improve the performance is getting rid of the multiple git operations considering that it's a mono repo and a single commit which triggered the redeploy:
I can also confirm huge perf improvements by adding:
directory:
includes: '*'
To our directory argocd apps (we only have maybe 30-50 of them). git_ms timing from our repo-server logs went from 40s to 20s
Close to 70 cores peak for a repo server pod in one of our clusters.
ArgoCD 2.8.4
Our problem seems quite similar to the ones from folks in the thread... we have a big mono repo and a high number of applications (+8k), and cloning the same repo for each app seems to be the cause of the performance problems.
When using the plugin as an initContainer in previous versions of ArgoCD (2.4.18), the same plugin synchronizes all apps and resources in just a few seconds (~20s), in the other hand, when using it as a sidecar it takes around 20 minutes, lots of CPU, and often never fully completes...
Question: Is it possible/recommended to still use plugins as InitContainer instead of changing to the new sidecar approach? had the impression the option was removed from v2.8+ versions but couldn't tell for sure from the docs so far...
Also, is there any way to avoid the multiple cloning of the same repo that we might have missed from the docs?
Thanks a ton in advance for any insights!!!
We having the same performance issue since we moved to CMP plugin, specifically we using helmfile plugin integration We have around 100 apps, and each app contains 2-3 charts, and its slower around x20 times more
It seems to work better for me after upgrading to 2.11 and applying the new argocd.argoproj.io/manifest-generate-paths
annotation to my Applications/ApplicationSets (previously this feature worked only for webhooks).
Example:
annotations:
argocd.argoproj.io/manifest-generate-paths: "."
ArgoCD versions 2.10 and below have reached EOL. Can you upgrade and let us know if the issue is still present, please?
Checklist:
argocd version
.Describe the bug
Using 2.8.3 of argocd we can see high cpu usages in the repo server for detecting the plugins.
We are using a huge monorepo for our applications, without any templating (just plain yaml). But the detection of plugins take a significant amount of time.
Flame graph with pixie:
Another one on cleanup:
Slack discussion: https://cloud-native.slack.com/archives/C01TSERG0KZ/p1694514516286809?thread_ts=1694175483.721089&cid=C01TSERG0KZ
CC: @csantanapr
To Reproduce
Apply thousand of apps at the same time
Expected behavior
Apply them "fast"
Screenshots
Version