argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.64k stars 5.38k forks source link

Better control on live state watch #8959

Open luis-garza opened 2 years ago

luis-garza commented 2 years ago

Summary

There should be a way to limit or make less aggressive the cluster live state watch.

We need to control how the live state watch is performed, right now ArgoCD monitors all kind of resources on the target clusters, and there is no flag or parameter to define intervals between kube API queries or fine tune it.

Motivation

I've found that ArgoCD can query too much target clusters kube API. Let's say I have 5 clusters, and 5 applications in each cluster, each one with not more than 20 resources. All 5 clusters are managed by a Rancher service, so all kube API calls are proxied through the same Rancher service endpoint.

The Rancher cluster is having instabilities due to the amount of kube API calls done from the ArgoCD application controller. There are around 200 kube API calls every minute per cluster, that means around 1000 kube API calls per minute.

I've tried to find the right flag or parameter to check the live state in a less aggressive mode without success... The only thing that helped to mitigate the affectation is the resource.inclusions setting, to white list the resources to check in the live state watch.

Now the overall kube API calls were reduced from 1000 to 150 per minite, but this workaround prevents us to deploy other resources, and hides other resources (such ReplicaSet EndPoints) in ArgoCD dashboard.

Another drawback is that as soon more clusters will be added, the kube API pressure will increase again...

Proposal

The cluster live state watch is done constantly, would be nice if we can set the time rate between kube API calls.

Another possibility is to only watch the resources within the application's namespaces omitting the rest.

Thanks in advance.

tonydelanuez commented 1 year ago

Have y'all found a viable workaround for this that isn't allowlisting resources via resource.inclusions? I'm running into the same issue on a similar deployment, one gateway proxying k8s API requests to other endpoints and ArgoCD knocking the gateway over.

nebojsa-prodana commented 3 months ago

Any updates on this? Have you found any workarounds @luis-garza ?