Open jessesuen opened 4 years ago
Not sure why this topic didn't get the boost, it needs. Awesome initiative.
Not to highjack this but here some of our performance metrics, if further metrics are useful they can be supplied.
argo cluster1: 9 clusters, 12 apps, 2 app projects.
$ k top po -n argocd
NAME CPU(cores) MEMORY(bytes)
argocd-application-controller-0 87m 753Mi
argocd-applicationset-controller-5666d7d88-r2db7 60m 190Mi
argocd-dex-server-6dbfc4d6bf-ltnrn 1m 21Mi
argocd-redis-ha-haproxy-7754ffd857-b9lg4 2m 70Mi
argocd-redis-ha-haproxy-7754ffd857-g56gj 3m 70Mi
argocd-redis-ha-haproxy-7754ffd857-tj7lm 3m 70Mi
argocd-redis-ha-server-0 10m 22Mi
argocd-redis-ha-server-1 12m 22Mi
argocd-redis-ha-server-2 11m 22Mi
argocd-repo-server-68b9bb94bd-h4nr5 3m 182Mi
argocd-repo-server-68b9bb94bd-qhqnx 3m 183Mi
argocd-server-6c5cddb5-kv5lc 1m 31Mi
argocd-server-6c5cddb5-td52r 2m 30Mi
argo cluster1: 21 clusters, 140 applications, 27 app projects
$ k top po -n argocd
NAME CPU(cores) MEMORY(bytes)
argocd-application-controller-0 217m 1674Mi
argocd-applicationset-controller-5666d7d88-nrf98 52m 214Mi
argocd-dex-server-6dbfc4d6bf-dvjfl 1m 22Mi
argocd-redis-ha-haproxy-7754ffd857-8xcwg 3m 70Mi
argocd-redis-ha-haproxy-7754ffd857-c9x44 4m 70Mi
argocd-redis-ha-haproxy-7754ffd857-xjwrv 3m 70Mi
argocd-redis-ha-server-0 12m 46Mi
argocd-redis-ha-server-1 11m 44Mi
argocd-redis-ha-server-2 10m 44Mi
argocd-repo-server-68b9bb94bd-qmnkz 4m 168Mi
argocd-repo-server-68b9bb94bd-qxlds 5m 227Mi
argocd-server-5d965fc9d4-d2cjz 2m 39Mi
argocd-server-5d965fc9d4-flx22 2m 44Mi
We use app-of-app pattern, with a single applicationset and app project for each argocd cluster.
We've noticed that the clusters with more namespaces per application-set take a much longer time to sync. We've started to create application-set and app-project for each namespace. This has shown slightly faster syncing functionality.
I don't have good metrics or information, but anectdotally it appears that the combined appsets can go hours without noticing the gitops repo has have updated and require us to click the "delete" or "refresh" button more often than the workloads that have less namespaces per app-project + application-set.
This leads to a poor user experience as our automation is around making changes to the gitops repo, and we expect argocd to quickly notice changes and keep it synced with the repo.
Any progress on the topic? My team is frequently experiencing very slow refreshing and it has been affecting the time between code changes and sync completion. We suspect the symptom is due to poor repo server performance. Curious how we can improve it.
Hello, same issue here, our repo server restart a lot, but no logs about the issue. It takes a while to refresh the UI and we also experience timeout with argocd cli. Any clue how to improve those perf ?
One Argo CD instance is managing 104 clusters, 140 apps, and currently is using 5 GiB memory and 1.25 CPU. Another instance is managing 27 clusters, has 1000 apps, and is currently using 1.2 GiB memory and 0.6 CPU
These resource usages don't seem like large given the scale. Let me know if you disagree.
The controller is CPU sensitive, caused by lots of json marshaling of objects, so I would increase/remove cpu limits, since it has been known to get throttled and affect performance.
I think a proper way is to set up alerts on cpu/memory usage becoming too high. You can also shard application controller by clusters.
During the initial controller start-up, there is a memory spike because this is when argo-cd starts chatting with a lot of clusters and starts listing and streaming their resources. This eventually reduces/settles, but unfortunately it means that we can potentially OOM crashloop when memory limits are set.
How big are the spikes comparing to the requests and usage after?
Overall, I think there can be a variety of use cases and scale, requiring specific tuning, which can probably be generally described as "increase resources until it works well and set up alerts".
Summary
Need some documentation on how to performance tune Argo CD.
Context: https://argoproj.slack.com/archives/CASHNF6MS/p1585125402008900?thread_ts=1585008884.411100&cid=CASHNF6MS
Some points:
Some guidelines:
I suggest monitoring cpu/memory of the controller to come up with right sizing. It will vary depending on the number of clusters, apps. Some of our examples at time of writing: