argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.59k stars 5.36k forks source link

Repo Server high number of git requests when no changes are requested #12878

Open d-wierdsma opened 1 year ago

d-wierdsma commented 1 year ago

Checklist:

Describe the bug

After upgrading to ArgoCD version 2.6.4 from 2.6.2 we experienced an issue where Repo Server was unable to resolve a git client and forced many apps into an Unknown state. When I manually refreshed the apps they were able to resolve the git client without issues, however they would automatically refresh on their own leading once again to an Unknown state regardless of them now being Healthy. As you can see from screenshots it appears that the repoServer kept attempting connections to the git client in large amounts seen by the ls-remote dashboard panel screenshot.

Our current ArgoCD cluster has 5 clusters total that it deploys to using an App of Apps generator method, we currently have ~120 applications managed by this centralized ArgoCD cluster.

Our current setup for applications is roughly as follows: Per team and environment we have an App of Apps that creates another set of App of Apps for each application that we would like to deploy, this sub App of Apps will then deploy the application (for example prometheus) to the appropriate external clusters.

We are also utilizing the new Multi-source applications to use values files contained in our private git repositories with a mix of private and public helm charts.

To Reproduce

We have disabled the automatic refresh on apps in favour for git webhooks to refresh apps when there are changes to the repos

Have ~100 applications on a HA ArgoCD setup, with the following relevant settings:

repoServer:
  resources:
    limits:
      cpu: '1'
      memory: 512Mi
    requests:
      cpu: 250m
      memory: 256Mi

  env:
  - name: ARGOCD_GIT_ATTEMPTS_COUNT
    value: "3"

configs:
  cm:
    timeout.reconciliation: 0s

  params:
    reposerver.parallelism.limit: 10

I assume that the repoServer reached a rate-limit built into our internal gitlab instance and kept sending requests after getting failures.

Expected behavior

I expect Repo Server to eventually fail on calls to the git service and not keep sending requests when there are no changes and the application is healthy.

Screenshots

https://user-images.githubusercontent.com/13317139/225333581-0e118090-e311-457a-987e-0a9861860129.png https://user-images.githubusercontent.com/13317139/225334095-3786b36b-87ab-45b4-a572-18593fa371ee.png

Version Unable to determine the exact SHA as it took out our git version, but it was v2.6.4 as of approx. Tue Mar 14 12:03:49 2023 -0400

Logs


{"level":"info","msg":"manifest cache hit: \u0026ApplicationSource{RepoURL:https://chartmuseum....,Path:,TargetRevision:0.0.4,Helm:\u0026ApplicationSourceHelm{ValueFiles:[$values/apps/core/helm-values/cnc/kube-spot-termination-notice-handler-values.yaml],Parameters:[]HelmParameter{},ReleaseName:kube-spot-termination-notice-handler,Values:,FileParameters:[]HelmFileParameter{},Version:,PassCredentials:false,IgnoreMissingValueFiles:false,SkipCrds:false,},Kustomize:nil,Directory:nil,Plugin:nil,Chart:kube-spot-termination-notice-handler,Ref:,}/0.0.4","time":"2023-03-14T17:30:49Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:49Z","grpc.time_ms":264.745,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:49Z"}
{"level":"info","msg":"manifest cache miss: \u0026ApplicationSource{RepoURL:git@git.internal.....git,Path:,TargetRevision:HEAD,Helm:nil,Kustomize:nil,Directory:nil,Plugin:nil,Chart:,Ref:values,}/b2979fce4034fbc307149e628e05ce3bd5db892f","time":"2023-03-14T17:30:49Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:49Z","grpc.time_ms":75.812,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:49Z"}
{"level":"info","msg":"manifest cache miss: \u0026ApplicationSource{RepoURL:git@git.internal.....git,Path:,TargetRevision:HEAD,Helm:nil,Kustomize:nil,Directory:nil,Plugin:nil,Chart:,Ref:values,}/b2979fce4034fbc307149e628e05ce3bd5db892f","time":"2023-03-14T17:30:50Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:50Z","grpc.time_ms":2.291,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:50Z"}
{"level":"info","msg":"manifest cache hit: \u0026ApplicationSource{RepoURL:git@git.internal.....git,Path:apps/prod/manifests/cnc/cluster-autoscaler,TargetRevision:HEAD,Helm:nil,Kustomize:nil,Directory:\u0026ApplicationSourceDirectory{Recurse:true,Jsonnet:ApplicationSourceJsonnet{ExtVars:[]JsonnetVar{},TLAs:[]JsonnetVar{},Libs:[],},Exclude:,Include:,},Plugin:nil,Chart:,Ref:,}/b2979fce4034fbc307149e628e05ce3bd5db892f","time":"2023-03-14T17:30:50Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:50Z","grpc.time_ms":1.021,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:50Z"}
{"level":"info","msg":"manifest cache hit: \u0026ApplicationSource{RepoURL:https://kubernetes-sigs.github.io/metrics-server/,Path:,TargetRevision:3.8.3,Helm:\u0026ApplicationSourceHelm{ValueFiles:[$values/apps/prod/helm-values/cnc/metrics-server-values.yaml],Parameters:[]HelmParameter{},ReleaseName:metrics-server,Values:,FileParameters:[]HelmFileParameter{},Version:,PassCredentials:false,IgnoreMissingValueFiles:false,SkipCrds:false,},Kustomize:nil,Directory:nil,Plugin:nil,Chart:metrics-server,Ref:,}/3.8.3","time":"2023-03-14T17:30:50Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:49Z","grpc.time_ms":278.795,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:50Z"}
{"level":"info","msg":"manifest cache hit: \u0026ApplicationSource{RepoURL:git@git.internal.....git,Path:apps/core/manifests/cnc/cluster-autoscaler,TargetRevision:HEAD,Helm:nil,Kustomize:nil,Directory:\u0026ApplicationSourceDirectory{Recurse:true,Jsonnet:ApplicationSourceJsonnet{ExtVars:[]JsonnetVar{},TLAs:[]JsonnetVar{},Libs:[],},Exclude:,Include:,},Plugin:nil,Chart:,Ref:,}/b2979fce4034fbc307149e628e05ce3bd5db892f","time":"2023-03-14T17:30:50Z"}
{"level":"info","msg":"manifest cache hit: \u0026ApplicationSource{RepoURL:git@git.internal.....git,Path:apps/test/manifests/cnc/external-snapshotter,TargetRevision:HEAD,Helm:nil,Kustomize:nil,Directory:\u0026ApplicationSourceDirectory{Recurse:true,Jsonnet:ApplicationSourceJsonnet{ExtVars:[]JsonnetVar{},TLAs:[]JsonnetVar{},Libs:[],},Exclude:,Include:,},Plugin:nil,Chart:,Ref:,}/b2979fce4034fbc307149e628e05ce3bd5db892f","time":"2023-03-14T17:30:50Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:50Z","grpc.time_ms":1.348,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:50Z"}
{"level":"info","msg":"manifest cache miss: \u0026ApplicationSource{RepoURL:git@git.internal.....git,Path:,TargetRevision:HEAD,Helm:nil,Kustomize:nil,Directory:nil,Plugin:nil,Chart:,Ref:values,}/b2979fce4034fbc307149e628e05ce3bd5db892f","time":"2023-03-14T17:30:50Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:50Z","grpc.time_ms":2.362,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:50Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:50Z","grpc.time_ms":1.619,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:50Z"}
{"level":"info","msg":"manifest cache hit: \u0026ApplicationSource{RepoURL:git@git.internal.....git,Path:apps/dev/manifests/cnc/cluster-autoscaler,TargetRevision:HEAD,Helm:nil,Kustomize:nil,Directory:\u0026ApplicationSourceDirectory{Recurse:true,Jsonnet:ApplicationSourceJsonnet{ExtVars:[]JsonnetVar{},TLAs:[]JsonnetVar{},Libs:[],},Exclude:,Include:,},Plugin:nil,Chart:,Ref:,}/b2979fce4034fbc307149e628e05ce3bd5db892f","time":"2023-03-14T17:30:50Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:50Z","grpc.time_ms":2.416,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:50Z"}
{"level":"info","msg":"manifest cache hit: \u0026ApplicationSource{RepoURL:git@git.internal.....git,Path:apps/prod/manifests/cnc/external-snapshotter,TargetRevision:HEAD,Helm:nil,Kustomize:nil,Directory:\u0026ApplicationSourceDirectory{Recurse:true,Jsonnet:ApplicationSourceJsonnet{ExtVars:[]JsonnetVar{},TLAs:[]JsonnetVar{},Libs:[],},Exclude:,Include:,},Plugin:nil,Chart:,Ref:,}/b2979fce4034fbc307149e628e05ce3bd5db892f","time":"2023-03-14T17:30:50Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:50Z","grpc.time_ms":12.053,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:50Z"}
{"level":"info","msg":"manifest cache hit: \u0026ApplicationSource{RepoURL:git@git.internal.....git,Path:apps/test/manifests/cnc/cluster-autoscaler,TargetRevision:HEAD,Helm:nil,Kustomize:nil,Directory:\u0026ApplicationSourceDirectory{Recurse:true,Jsonnet:ApplicationSourceJsonnet{ExtVars:[]JsonnetVar{},TLAs:[]JsonnetVar{},Libs:[],},Exclude:,Include:,},Plugin:nil,Chart:,Ref:,}/b2979fce4034fbc307149e628e05ce3bd5db892f","time":"2023-03-14T17:30:50Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:50Z","grpc.time_ms":205.333,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:50Z"}
{"level":"info","msg":"manifest cache hit: \u0026ApplicationSource{RepoURL:git@git.internal.....git,Path:apps/dev/manifests/cnc/cluster-autoscaler,TargetRevision:HEAD,Helm:nil,Kustomize:nil,Directory:\u0026ApplicationSourceDirectory{Recurse:true,Jsonnet:ApplicationSourceJsonnet{ExtVars:[]JsonnetVar{},TLAs:[]JsonnetVar{},Libs:[],},Exclude:,Include:,},Plugin:nil,Chart:,Ref:,}/b2979fce4034fbc307149e628e05ce3bd5db892f","time":"2023-03-14T17:30:50Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:50Z","grpc.time_ms":1.166,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:50Z"}
{"level":"info","msg":"manifest cache hit: \u0026ApplicationSource{RepoURL:git@git.internal.....git,Path:apps/prod/manifests/cnc/cluster-autoscaler,TargetRevision:HEAD,Helm:nil,Kustomize:nil,Directory:\u0026ApplicationSourceDirectory{Recurse:true,Jsonnet:ApplicationSourceJsonnet{ExtVars:[]JsonnetVar{},TLAs:[]JsonnetVar{},Libs:[],},Exclude:,Include:,},Plugin:nil,Chart:,Ref:,}/b2979fce4034fbc307149e628e05ce3bd5db892f","time":"2023-03-14T17:30:50Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:50Z","grpc.time_ms":136.562,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:50Z"}
{"level":"info","msg":"manifest cache hit: \u0026ApplicationSource{RepoURL:git@git.internal.....git,Path:apps/core/manifests/cnc/cluster-autoscaler,TargetRevision:HEAD,Helm:nil,Kustomize:nil,Directory:\u0026ApplicationSourceDirectory{Recurse:true,Jsonnet:ApplicationSourceJsonnet{ExtVars:[]JsonnetVar{},TLAs:[]JsonnetVar{},Libs:[],},Exclude:,Include:,},Plugin:nil,Chart:,Ref:,}/b2979fce4034fbc307149e628e05ce3bd5db892f","time":"2023-03-14T17:30:50Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:50Z","grpc.time_ms":111.975,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:50Z"}
{"level":"info","msg":"manifest cache hit: \u0026ApplicationSource{RepoURL:git@git.internal.....git,Path:apps/test/manifests/cnc/external-snapshotter,TargetRevision:HEAD,Helm:nil,Kustomize:nil,Directory:\u0026ApplicationSourceDirectory{Recurse:true,Jsonnet:ApplicationSourceJsonnet{ExtVars:[]JsonnetVar{},TLAs:[]JsonnetVar{},Libs:[],},Exclude:,Include:,},Plugin:nil,Chart:,Ref:,}/b2979fce4034fbc307149e628e05ce3bd5db892f","time":"2023-03-14T17:30:50Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:50Z","grpc.time_ms":85.566,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:50Z"}
{"error":"failed to get git client for repo git@git.internal.....git","grpc.code":"Unknown","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:49Z","grpc.time_ms":640.268,"level":"error","msg":"finished unary call with code Unknown","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:50Z"}
{"level":"info","msg":"manifest cache hit: \u0026ApplicationSource{RepoURL:git@git.internal.....git,Path:apps/test/manifests/cnc/cluster-autoscaler,TargetRevision:HEAD,Helm:nil,Kustomize:nil,Directory:\u0026ApplicationSourceDirectory{Recurse:true,Jsonnet:ApplicationSourceJsonnet{ExtVars:[]JsonnetVar{},TLAs:[]JsonnetVar{},Libs:[],},Exclude:,Include:,},Plugin:nil,Chart:,Ref:,}/b2979fce4034fbc307149e628e05ce3bd5db892f","time":"2023-03-14T17:30:50Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:50Z","grpc.time_ms":81.811,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:50Z"}
{"level":"info","msg":"manifest cache hit: \u0026ApplicationSource{RepoURL:git@git.internal.....git,Path:apps/dev/manifests/cnc/external-snapshotter,TargetRevision:HEAD,Helm:nil,Kustomize:nil,Directory:\u0026ApplicationSourceDirectory{Recurse:true,Jsonnet:ApplicationSourceJsonnet{ExtVars:[]JsonnetVar{},TLAs:[]JsonnetVar{},Libs:[],},Exclude:,Include:,},Plugin:nil,Chart:,Ref:,}/b2979fce4034fbc307149e628e05ce3bd5db892f","time":"2023-03-14T17:30:50Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:50Z","grpc.time_ms":417.581,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:50Z"}
{"level":"info","msg":"manifest cache hit: \u0026ApplicationSource{RepoURL:git@git.internal.....git,Path:apps/core/manifests/cnc/external-snapshotter,TargetRevision:HEAD,Helm:nil,Kustomize:nil,Directory:\u0026ApplicationSourceDirectory{Recurse:true,Jsonnet:ApplicationSourceJsonnet{ExtVars:[]JsonnetVar{},TLAs:[]JsonnetVar{},Libs:[],},Exclude:,Include:,},Plugin:nil,Chart:,Ref:,}/b2979fce4034fbc307149e628e05ce3bd5db892f","time":"2023-03-14T17:30:50Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:50Z","grpc.time_ms":1.642,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:50Z"}
{"error":"failed to get git client for repo git@git.internal.....git","grpc.code":"Unknown","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:50Z","grpc.time_ms":712.672,"level":"error","msg":"finished unary call with code Unknown","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:50Z"}
{"error":"failed to get git client for repo git@git.internal.....git","grpc.code":"Unknown","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:50Z","grpc.time_ms":769.005,"level":"error","msg":"finished unary call with code Unknown","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:50Z"}
{"error":"failed to get git client for repo git@git.internal.....git","grpc.code":"Unknown","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:50Z","grpc.time_ms":851.914,"level":"error","msg":"finished unary call with code Unknown","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:50Z"}
{"error":"failed to get git client for repo git@git.internal.....git","grpc.code":"Unknown","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:50Z","grpc.time_ms":803.429,"level":"error","msg":"finished unary call with code Unknown","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:50Z"}
{"error":"failed to get git client for repo git@git.internal.....git","grpc.code":"Unknown","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:50Z","grpc.time_ms":693.523,"level":"error","msg":"finished unary call with code Unknown","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:50Z"}
{"error":"failed to get git client for repo git@git.internal.....git","grpc.code":"Unknown","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:50Z","grpc.time_ms":866.32,"level":"error","msg":"finished unary call with code Unknown","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:50Z"}
{"level":"info","msg":"manifest cache hit: \u0026ApplicationSource{RepoURL:git@git.internal.....git,Path:apps/core/manifests/cnc/cluster-autoscaler,TargetRevision:HEAD,Helm:nil,Kustomize:nil,Directory:\u0026ApplicationSourceDirectory{Recurse:true,Jsonnet:ApplicationSourceJsonnet{ExtVars:[]JsonnetVar{},TLAs:[]JsonnetVar{},Libs:[],},Exclude:,Include:,},Plugin:nil,Chart:,Ref:,}/b2979fce4034fbc307149e628e05ce3bd5db892f","time":"2023-03-14T17:30:51Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:51Z","grpc.time_ms":1.019,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:51Z"}
{"level":"info","msg":"manifest cache hit: \u0026ApplicationSource{RepoURL:git@git.internal.....git,Path:apps/dev/manifests/cnc/cluster-autoscaler,TargetRevision:HEAD,Helm:nil,Kustomize:nil,Directory:\u0026ApplicationSourceDirectory{Recurse:true,Jsonnet:ApplicationSourceJsonnet{ExtVars:[]JsonnetVar{},TLAs:[]JsonnetVar{},Libs:[],},Exclude:,Include:,},Plugin:nil,Chart:,Ref:,}/b2979fce4034fbc307149e628e05ce3bd5db892f","time":"2023-03-14T17:30:51Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:51Z","grpc.time_ms":1.042,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:51Z"}
{"level":"info","msg":"manifest cache hit: \u0026ApplicationSource{RepoURL:git@git.internal.....git,Path:apps/core/manifests/cnc/external-snapshotter,TargetRevision:HEAD,Helm:nil,Kustomize:nil,Directory:\u0026ApplicationSourceDirectory{Recurse:true,Jsonnet:ApplicationSourceJsonnet{ExtVars:[]JsonnetVar{},TLAs:[]JsonnetVar{},Libs:[],},Exclude:,Include:,},Plugin:nil,Chart:,Ref:,}/b2979fce4034fbc307149e628e05ce3bd5db892f","time":"2023-03-14T17:30:51Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:51Z","grpc.time_ms":4.56,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:51Z"}
{"level":"info","msg":"manifest cache hit: \u0026ApplicationSource{RepoURL:git@git.internal.....git,Path:apps/prod/manifests/cnc/cluster-autoscaler,TargetRevision:HEAD,Helm:nil,Kustomize:nil,Directory:\u0026ApplicationSourceDirectory{Recurse:true,Jsonnet:ApplicationSourceJsonnet{ExtVars:[]JsonnetVar{},TLAs:[]JsonnetVar{},Libs:[],},Exclude:,Include:,},Plugin:nil,Chart:,Ref:,}/b2979fce4034fbc307149e628e05ce3bd5db892f","time":"2023-03-14T17:30:51Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:51Z","grpc.time_ms":4.268,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:51Z"}
{"error":"failed to get git client for repo git@git.internal.....git","grpc.code":"Unknown","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:50Z","grpc.time_ms":827.495,"level":"error","msg":"finished unary call with code Unknown","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:51Z"}
{"error":"failed to get git client for repo git@git.internal.....git","grpc.code":"Unknown","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-03-14T17:30:50Z","grpc.time_ms":815.685,"level":"error","msg":"finished unary call with code Unknown","span.kind":"server","system":"grpc","time":"2023-03-14T17:30:51Z"}
d-wierdsma commented 1 year ago

Just was able to regenerate this issue again when attempting an upgrade to v2.6.7 from v2.6.1. This time I did not see any failed to get git client for repo errors however I rolled back within 15 minutes so I'm guessing it just didn't have time to reach the git rate limit. Screen Shot 2023-04-06 at 11 35 01 AM

Screen Shot 2023-04-06 at 11 37 47 AM

We can see from these images that CPU spikes almost immediately causing the HPA to scale up the number of repo-servers which compounds the issue.

image As for logs, we can also see a distinct spike in logs at this time as well. I'm still investigating these logs to see if there is any apparent issues, but at first glance it looks like the following:

{"level":"info","msg":"manifest cache hit: \u0026ApplicationSource{RepoURL:https://chartmuseum.xxx.com,Path:,TargetRevision:0.2.4,Helm:\u0026ApplicationSourceHelm{ValueFiles:[$values/apps/test/application-values.yaml $values/apps/test/test-values.yaml $values/clusters/test/cluster-values.yaml],Parameters:[]HelmParameter{},ReleaseName:,Values:,FileParameters:[]HelmFileParameter{},Version:,PassCredentials:false,IgnoreMissingValueFiles:false,SkipCrds:false,},Kustomize:nil,Directory:nil,Plugin:nil,Chart:gitops,Ref:,}/0.2.4","time":"2023-04-06T15:17:41Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-04-06T15:17:41Z","grpc.time_ms":588.414,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-04-06T15:17:41Z"}
{"level":"info","msg":"manifest cache miss: \u0026ApplicationSource{RepoURL:git@git.xxx.git,Path:,TargetRevision:HEAD,Helm:nil,Kustomize:nil,Directory:nil,Plugin:nil,Chart:,Ref:values,}/0a6daabbea0494097a41c0bcbacece4cb1908631","time":"2023-04-06T15:17:41Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-04-06T15:17:41Z","grpc.time_ms":3.614,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-04-06T15:17:41Z"}
{"level":"info","msg":"manifest cache hit: \u0026ApplicationSource{RepoURL:https://chartmuseum.xxx.com,Path:,TargetRevision:0.2.4,Helm:\u0026ApplicationSourceHelm{ValueFiles:[$values/apps/shared-services/application-values.yaml $values/apps/shared-services/shared-services-values.yaml $values/clusters/shared-services/cluster-values.yaml],Parameters:[]HelmParameter{},ReleaseName:,Values:,FileParameters:[]HelmFileParameter{},Version:,PassCredentials:false,IgnoreMissingValueFiles:false,SkipCrds:false,},Kustomize:nil,Directory:nil,Plugin:nil,Chart:gitops,Ref:,}/0.2.4","time":"2023-04-06T15:17:41Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-04-06T15:17:41Z","grpc.time_ms":628.879,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-04-06T15:17:41Z"}
{"level":"info","msg":"manifest cache miss: \u0026ApplicationSource{RepoURL:git@git.xxx.git,Path:,TargetRevision:HEAD,Helm:nil,Kustomize:nil,Directory:nil,Plugin:nil,Chart:,Ref:values,}/61355835ddae350ebe8c19e3ed49a426c574a464","time":"2023-04-06T15:17:41Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-04-06T15:17:41Z","grpc.time_ms":3.632,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-04-06T15:17:41Z"}
{"level":"info","msg":"manifest cache hit: \u0026ApplicationSource{RepoURL:https://chartmuseum.xxx.com,Path:,TargetRevision:0.2.4,Helm:\u0026ApplicationSourceHelm{ValueFiles:[$values/apps/dev/application-values.yaml $values/apps/dev/dev-values.yaml $values/clusters/dev/cluster-values.yaml],Parameters:[]HelmParameter{},ReleaseName:,Values:,FileParameters:[]HelmFileParameter{},Version:,PassCredentials:false,IgnoreMissingValueFiles:false,SkipCrds:false,},Kustomize:nil,Directory:nil,Plugin:nil,Chart:gitops,Ref:,}/0.2.4","time":"2023-04-06T15:17:41Z"}
{"grpc.code":"OK","grpc.method":"GenerateManifest","grpc.service":"repository.RepoServerService","grpc.start_time":"2023-04-06T15:17:41Z","grpc.time_ms":635.541,"level":"info","msg":"finished unary call with code OK","span.kind":"server","system":"grpc","time":"2023-04-06T15:17:41Z"}
d-wierdsma commented 1 year ago

I've also just verified that our gitlab instance has no authenticated API request rate limits set, as we are using SSH creds I assume this is how Repo Server is making requests.

d-wierdsma commented 1 year ago

The interesting part of this to me is that repo Server seems to be pulling all repos on startup, even though we have disabled automatic sync and only intend to trigger Syncs from Webhooks themselves. Entirely possible I'm misunderstanding the Repo Server startup process though

r0bj commented 1 year ago

I experienced similar issue, argocd multi-source Applications stayed in Unknown state until manually refreshed. I also noticed some github rate limiting during this issue.

d-wierdsma commented 1 year ago

I turned on ApplicationSet and Application Controller Debug logs and started to see that there were a ton of reconciliation loops being created by the Application Controller due to Orphaned resources.

I had set orphanedResources tag on all my ArgoCD Projects that made my applications attempt to claim ownership of all orphaned resources within its namespace that the application is deployed to.

spec:
  description: Argocd Project
  orphanedResources:
    warn: false

Here is the difference in reconciliation and git ls-remote calls. image image

There is some reconciliation loops still in place that I need to investigate, but it's significantly better now.

Ref: https://github.com/argoproj/argo-cd/issues/8100#issuecomment-1076067184

andrleite commented 1 year ago

Hello, Any updates on this? We've tried to upgrade from version 2.6.2 to 2.6.11 and Git Requests and Reconciliation start increasing immediately. After rolling it back it decreases significantly. image image

andrleite commented 1 year ago

@crenshaw-dev Did you see sth similar? I saw you asked us to create a separate issue. Please could you help us with it?

andrleite commented 1 year ago

I've noticed the issue starts at version 2.6.3 and an endless loop of reconciliation happening to applications that have recurse: true

rayleshh commented 1 year ago

Same Here!

crenshaw-dev commented 1 year ago

Is everyone here using ApplicationSets? I suspect the issue might be related to the ApplicationSet controller failing to normalize the App spec before applying it. The Application controller and the ApplicationSet controller end up fighting over the correct App manifest, resulting in constant reconciliation.

I've merged a fix: https://github.com/argoproj/argo-cd/pull/14481

andrleite commented 1 year ago

Yes, we're using ApplicationSets as the other guys mentioned #14712. I've tried version 2.7.10 with no success.

stafot commented 1 year ago

Relevant comments to this. Adding them here for reference: https://github.com/argoproj/argo-cd/issues/14712#issuecomment-1662320951, https://github.com/argoproj/argo-cd/issues/14712#issuecomment-1662346485, https://github.com/argoproj/argo-cd/issues/14712#issuecomment-1663420058, https://github.com/argoproj/argo-cd/issues/14712#issuecomment-1663813660, https://github.com/argoproj/argo-cd/issues/14712#issuecomment-1663834340, https://github.com/argoproj/argo-cd/issues/14712#issuecomment-1663840412

crenshaw-dev commented 1 year ago

@stafot do we know for certain yet that the appset controller is involved at all in the high request count in your env? What happens if you scale down the controller for a few minutes?

I'm a little suspicious that multi-source apps might be to blame in your case: https://github.com/argoproj/argo-cd/issues/14725

andrleite commented 1 year ago

@crenshaw-dev We did the test, after upgrading Argocd and scaling down the appset controller to zero the reconciliation activity kept increasing.

andrleite commented 1 year ago

@crenshaw-dev Do you think the case: #14725 is related? I was reading the recent messages that seem very similar to our problem, we're using the app-of-apps pattern with multi-source apps.

spirosoik commented 8 months ago

@crenshaw-dev I am wondering if there will be any actions on this. It is happening for several months, it has been mentioned by several and seems that there's no really activity on this issue.

It's a pity that we cannot even upgrade to latest versions of ArgoCD and catchup with security vulnerabilities and latest improvements. Is there any other workaround?

andrleite commented 8 months ago

@nromriell We're following your amazing work in this issue where two parts has already merged. We believe our issue is related and we want to share some results after upgrading to 2.9.5. We're stuck in version 2.6.2 since the bug was introduced so we upgraded from it. Before your changes, the reconciliation and git requests started to increase non-stop, now it is high but stable like the graphics below:

image image

We've ~150 apps with multi-source.

Do you believe is this expected until the merge of the third and fourth parts of the issue?

Thanks.

nromriell commented 8 months ago

Hi @andrleite as of the last state I would expect the checkouts at least to be lower

My changes are primarily around fixing the number of git requests between cache invalidations. Looking at the graphs you shared here it looks like your cache is nearly constantly invalidated which looks like the primary issue and likely why you aren't seeing the behavior you'd expect. Have you tried setting timeout.reconciliation to something very high like 24 hours rather than 0 to compare?

My time has been pretty limited lately so I haven't been able to continue to make improvements here but should at least be able to look at opening up the remaining two PRs here shortly. I think as is though those probably won't fix what you're seeing since they rely on the items being cached, it would just reduce the call count per cycle.