jessesuen commented 4 years ago

Summary

Need some documentation on how to performance tune Argo CD.

Context: https://argoproj.slack.com/archives/CASHNF6MS/p1585125402008900?thread_ts=1585008884.411100&cid=CASHNF6MS

Some points:

The controller is CPU sensitive, caused by lots of json marshaling of objects, so I would increase/remove cpu limits, since it has been known to get throttled and affect performance.
During the initial controller start-up, there is a memory spike because this is when argo-cd starts chatting with a lot of clusters and starts listing and streaming their resources. This eventually reduces/settles, but unfortunately it means that we can potentially OOM crashloop when memory limits are set. We’re working on fixing this spike: https://github.com/argoproj/argo-cd/issues/3241

Some guidelines:

the more clusters which are managed == more memory/cpu needed by controller.
If the ratio between apps to repositories is too high, then increase repo server replicas because there is a mutex held on a repository URL during git operations (fetch/cloning)
the repo server may need more memory depending on if your apps are jsonnet/kustomize based vs. raw yaml. This is because it will invoke those tools to render manifests. Jsonnet is notoriously memory hungry.

I suggest monitoring cpu/memory of the controller to come up with right sizing. It will vary depending on the number of clusters, apps. Some of our examples at time of writing:

One Argo CD instance is managing 104 clusters, 140 apps, and currently is using 5 GiB memory and 1.25 CPU
Another instance is managing 27 clusters, has 1000 apps, and is currently using 1.2 GiB memory and 0.6 CPU

jaydipdave commented 3 years ago

Not sure why this topic didn't get the boost, it needs. Awesome initiative.

apex-omontgomery commented 2 years ago

Not to highjack this but here some of our performance metrics, if further metrics are useful they can be supplied.

argo cluster1: 9 clusters, 12 apps, 2 app projects.

$ k top po -n argocd
NAME                                               CPU(cores)   MEMORY(bytes)
argocd-application-controller-0                    87m          753Mi
argocd-applicationset-controller-5666d7d88-r2db7   60m          190Mi
argocd-dex-server-6dbfc4d6bf-ltnrn                 1m           21Mi
argocd-redis-ha-haproxy-7754ffd857-b9lg4           2m           70Mi
argocd-redis-ha-haproxy-7754ffd857-g56gj           3m           70Mi
argocd-redis-ha-haproxy-7754ffd857-tj7lm           3m           70Mi
argocd-redis-ha-server-0                           10m          22Mi
argocd-redis-ha-server-1                           12m          22Mi
argocd-redis-ha-server-2                           11m          22Mi
argocd-repo-server-68b9bb94bd-h4nr5                3m           182Mi
argocd-repo-server-68b9bb94bd-qhqnx                3m           183Mi
argocd-server-6c5cddb5-kv5lc                       1m           31Mi
argocd-server-6c5cddb5-td52r                       2m           30Mi

argo cluster1: 21 clusters, 140 applications, 27 app projects

$ k top po -n argocd
NAME                                               CPU(cores)   MEMORY(bytes)
argocd-application-controller-0                    217m         1674Mi
argocd-applicationset-controller-5666d7d88-nrf98   52m          214Mi
argocd-dex-server-6dbfc4d6bf-dvjfl                 1m           22Mi
argocd-redis-ha-haproxy-7754ffd857-8xcwg           3m           70Mi
argocd-redis-ha-haproxy-7754ffd857-c9x44           4m           70Mi
argocd-redis-ha-haproxy-7754ffd857-xjwrv           3m           70Mi
argocd-redis-ha-server-0                           12m          46Mi
argocd-redis-ha-server-1                           11m          44Mi
argocd-redis-ha-server-2                           10m          44Mi
argocd-repo-server-68b9bb94bd-qmnkz                4m           168Mi
argocd-repo-server-68b9bb94bd-qxlds                5m           227Mi
argocd-server-5d965fc9d4-d2cjz                     2m           39Mi
argocd-server-5d965fc9d4-flx22                     2m           44Mi

We use app-of-app pattern, with a single applicationset and app project for each argocd cluster.

app-of-app (argocd instance -> each cluster managed by argocd) to bootstrap per cluster app-of-app
app-of-app (single-cluster, multiple namespaces) -> bootstrap many applications
appproject for multiple namespaces + application for multiple namespaces

We've noticed that the clusters with more namespaces per application-set take a much longer time to sync. We've started to create application-set and app-project for each namespace. This has shown slightly faster syncing functionality.

I don't have good metrics or information, but anectdotally it appears that the combined appsets can go hours without noticing the gitops repo has have updated and require us to click the "delete" or "refresh" button more often than the workloads that have less namespaces per app-project + application-set.

This leads to a poor user experience as our automation is around making changes to the gitops repo, and we expect argocd to quickly notice changes and keep it synced with the repo.

beyondbill commented 7 months ago

Any progress on the topic? My team is frequently experiencing very slow refreshing and it has been affecting the time between code changes and sync completion. We suspect the symptom is due to poor repo server performance. Curious how we can improve it.

kopax-polyconseil commented 5 months ago

Hello, same issue here, our repo server restart a lot, but no logs about the issue. It takes a while to refresh the UI and we also experience timeout with argocd cli. Any clue how to improve those perf ?

andrii-korotkov-verkada commented 6 days ago

One Argo CD instance is managing 104 clusters, 140 apps, and currently is using 5 GiB memory and 1.25 CPU. Another instance is managing 27 clusters, has 1000 apps, and is currently using 1.2 GiB memory and 0.6 CPU

These resource usages don't seem like large given the scale. Let me know if you disagree.

The controller is CPU sensitive, caused by lots of json marshaling of objects, so I would increase/remove cpu limits, since it has been known to get throttled and affect performance.

I think a proper way is to set up alerts on cpu/memory usage becoming too high. You can also shard application controller by clusters.

During the initial controller start-up, there is a memory spike because this is when argo-cd starts chatting with a lot of clusters and starts listing and streaming their resources. This eventually reduces/settles, but unfortunately it means that we can potentially OOM crashloop when memory limits are set.

How big are the spikes comparing to the requests and usage after?

Overall, I think there can be a variety of use cases and scale, requiring specific tuning, which can probably be generally described as "increase resources until it works well and set up alerts".

argoproj / argo-cd

Performance tuning documentation #3282

Summary