argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
18k stars 5.48k forks source link

Dynamic resource filtering for caching only those resources that are managed by ArgoCD #17236

Open anandf opened 9 months ago

anandf commented 9 months ago

Summary

resource.inclusions / resource.exclusions are static mechanisms to control the resource kinds that are being watched and stored in the cluster cache. Implement a dynamic watch that watches only those resources that are managed by any of the argo application, instead of watching all resources.

Motivation

In an OpenShift based setup (or a similar k8s setup), which has a huge number of CRDs (~200), not all CRDs are to be managed by ArgoCD. Current implementation of the cluster cache creates a watch for each resource type per namespace, causing too many watch connections opened to the API server. This causes client side throttling as we can see in the below error message.

I0117 11:37:10.038643 1 request.go:601] Waited for 1.001246788s due to client-side throttling, not priority and fairness, request: GET:[https://172.30.0.1:443/api/v1/namespaces/test-ns-011/](https://172.30.0.1/api/v1/namespaces/test-ns-011/secrets?limit=500)...

When we tested with ~100 namespaces, it was observed that there were too many watches created and the requests were throttled. This issue could be partially solved by setting resources.inclusion and resources.exclusion fields. But since these are static, users have to know in advance what exact resource types would have to be managed by ArgoCD.

Proposal

To overcome the problem of too many watches created, and to overcome the static nature of resource.inclusions / resource.exclusions settings, it is preferrable to have ArgoCD determine which resource types are being managed by Argo applications and create watches only for those specific types. This will reduce the number of network connections opened to the API server and also reduce the cache memory usage of the application controller.

How do you think this should be implemented? The changes should be done in the ClusterCache code in the gitops-engine code base. Have 2 set of API resources. One that is available in the cluster and another set of resources that are managed via a the Argo application. Create the watches only for those resource types that are managed by any Argo application.

anandf commented 8 months ago

related issue https://github.com/argoproj/argo-cd/issues/6561