argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.86k stars 5.45k forks source link

Cluster secrets with identical server URL should resolve to the same shard #18118

Open Ezzahhh opened 6 months ago

Ezzahhh commented 6 months ago

Checklist:

Describe the bug

There are use cases where using AppSets we may want to create multiple cluster secrets with different name but the same server, such as GitOps bridge concept of passing labels and annotations into the App/AppSet but the workload may happen to be in the same cluster but just in a different namespace.

Both legacy and round-robin sharding algorithms take the Cluster secret ID to calculate the shard. This means that if two secrets "a" and "b" both haveserver: https://api.master-node.com they could conceivably resolve different shard values which means more than one controller will try to sync the same physical cluster.

The solution should be to shard on the server instead of the ID as it cannot be assumed that the ID represents a unique physical cluster but the server can be.

My workaround: manually set shard in the cluster secrets for all that have the same server

To Reproduce

Create 10 secrets (or any arbitrary number greater than 2) with different names but the same server. Enable sharding on either algorithm and check metrics sum(argocd_cluster_info) by (pod, server) You will observe that at least two controllers will report that they manage the same server URL. Subsequently, sum(argocd_app_info{}) will show an elevated number as controllers are syncing all Applications belonging to those clusters.

Expected behavior

Duplicate cluster secrets with the same server should resolve to the same shard instead of by an arbitrary ID entry that incorrectly assumes uniqueness by each entry.

Screenshots

Version

Paste the output from `argocd version` here.
Argo CD
v2.10.9+c071af8
Build Date
2024-04-30T15:53:28Z
Go Version
go1.21.3
Go Compiler
gc
Platform
linux/amd64
jsonnet
v0.20.0
kustomize
v5.2.1 2023-10-19T20:13:51Z
Helm
v3.14.3+gf03cc04
kubectl
v0.26.11
Ezzahhh commented 6 months ago

Additionally would also like to add an interesting behaviour when we have duplicated server URLs:

  1. Go to Settings -> Clusters
  2. Click on a cluster that has duplicated server URLs
  3. Observe the cluster details to change very second cycling through all the other secrets that share the same server URL
  4. If you click "invalidate cache", the cluster will be deleted and replaced with a random identical cluster of the same server URL. For example, if cluster "a" and cluster "b" share the same server URL, invalidating cache on cluster "a" will cause cluster "a" to be deleted and an identical cluster "b" will be created, so your final result will have two cluster "b".

If you have any AppSets that target these, then you will obviously get issues with the Apps that are generated conflicting against one another and complaining about being owned by another App.

rouke-broersma commented 6 months ago

Wouldn't a better solution be for the application controller to only manage apps that have the cluster name they are supposed to manage instead of managing all apps that have the same cluster uri?

This would potentially make it possible to load balance the same cluster on multiple controllers, your solution makes that impossible without once again modifying the algorithm.

Ezzahhh commented 6 months ago

Yes I believe if the application controllers only managed Applications by cluster name instead, it would also solve this problem.

I don't yet understand the inner workings of Argo enough to know the implications and what needs to change to get this done - such as performance implications for having multiple controllers setting up Kubernetes watch APIs to the same physical cluster or how caching may be affected.

I'm more than happy to take more feedback or pointers about solving this issue.