argoproj / gitops-engine

Democratizing GitOps
https://pkg.go.dev/github.com/argoproj/gitops-engine?tab=subdirectories
Apache License 2.0
1.68k stars 252 forks source link

Events on CRDs cause full cluster discovery #523

Open torfjor opened 1 year ago

torfjor commented 1 year ago

Hi!

We have a setup where a central admin cluster running Argo CD is managing Applications on a fleet of workload clusters. Our central admin cluster connects to the worker clusters through Anthos Fleet and the Connect Gateway. The worker clusters are a mix of Anthos Bare Metal and GKE.

We ran into an issue where we hit the default Connect Gateway API quota with only two registered workload clusters and a handful of deployed Applications. Investigation showed that Argo CD was performing full API discovery requests on registered workload clusters multiple times per minute. Further investigation led us to this event loop in gitops-engine, where c.startMissingWatches() performs a non-cached discovery of the target cluster each time a CRD changes.

This turns out to be problematic for GKE clusters with Backup for GKE enabled, because the system-provided addon-manager will patch its CRDs very often:

Screenshot 2023-05-25 at 12 42 51

Looking at the Connect Gateway API traffic you can see the sharp drop when we added a resource exclusion on gkebackup.gke.io/*

Traffic by response code (PS: The last sudden spike was caused by us temporarily removing the resource exclusion)

For our use case, having gkebackup.gke.io/* excluded is totally fine. We contacted Google Support about the issue, and the rapidly patched CRDs is intended behaviour. Their immediate response to the chatty nature of Argo CD was to just raise the quota for affected customers.

Writing up this issue because it might not be very evident for people running Argo or Flux targeting GKE clusters unless they have good visibility into their API server traffic.

Possibly related:

Update 05-31-2023:

Just heard back from the product team for Backup for GKE and a fix for the rapidly patched CRDs will be rolled out next week.