Open ktomaszx opened 3 months ago
hi Iam facing the same issue in argocd v2.12 my case is Argocd deployed on eks-v1.29 and deploy app of apps pattern deploying apps to its own cluster and resources of final apps like (deploy,config,secret,sts,...) to remote cluster eks-v1.30 and while getting this logs of "failed watch" some of apps got timeout syncying to remote cluster note: using sharding on 2 replica sts argocd controller, own cluster as shard 0 and remote cluster as shard 1 note: argocd on cluster-v1.29 is reaching remote cluster-v1.30 using privatelink
Env Two eks clusters in seperate VPCs. eks1 with argocd eks2 with vClusters and apps deployed inside
Describe the bug I'm facing an issue with argocd controller which logs hundreds of Start watch {...} Failed to watch {...} It causes that NAT gateway active connections between VPCs are enormous, ~10-15x higher. The issue is visible each time vCluster is paused. As soon as argocd controller is restarted issue is gone. (no more Start watch... Failed to watch... logs and NAT active connections back to normal level)
To Reproduce Make target cluster unavailable (in my case vCluster paused) and observe controller logs. Then restart controller pod.
Expected behavior argocd controller doesn't need to be restarted when target cluster is unavailable
Version v2.11.2
Logs