argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.86k stars 5.45k forks source link

Argo does not automatically sync an application when using multi-repo configuration #19382

Open daviddvir14 opened 3 months ago

daviddvir14 commented 3 months ago

Checklist:

Describe the bug

Argo does not automatically sync an application after changes are made to a source repository (Git) when using a multi-repo configuration. However, manual sync works as expected.

To Reproduce

  1. Create a new application
    apiVersion: argoproj.io/v1alpha1
    kind: Application
    metadata:
    name: karpenter
    namespace: argocd
    spec:
    destination:
    namespace: kube-system
    server: https://kubernetes.default.svc/
    project: default
    sources:
    - repoURL: "ghcr.io/a-repo"
      targetRevision: 0.35.5
      chart: karpenter
      helm:
        passCredentials: true
        valueFiles:
          - $values/infra/staging/kube-system/karpenter/values.yaml
    - repoURL: "https://github.com/a-repo/argo-config.git"
      targetRevision: HEAD
      ref: values
    syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
  2. Change a value in the ref repo

Expected behavior

Argo should automatically sync the application when there is a change in the source repository.

Screenshots

Version

argocd: v2.11.7+e4a0246
  BuildDate: 2024-07-24T14:02:54Z
  GitCommit: e4a0246c4d920bc1e5ee5f9048a99eca7e1d53cb
  GitTreeState: clean
  GoVersion: go1.22.5
  Compiler: gc
  Platform: darwin/arm64
argocd-server: v2.11.3+3f344d5

Logs

application-controller:

time="2024-08-05T11:11:02Z" level=debug msg="https://github.com/a-repo/argo-config.git has credentials"
time="2024-08-05T11:11:02Z" level=debug msg="Generating Manifest for source {https://github.com/a-repo/argo-config.git  HEAD nil nil nil nil  values} revision HEAD"
time="2024-08-05T11:11:03Z" level=info msg="Skipping auto-sync: application status is Synced" application=argocd/karpenter

repo-service:

time="2024-08-05T11:11:02Z" level=debug msg="getting manifests cache" appName=karpenter appSrc="{\"appSrc\":{\"repoURL\":\"ghcr.io/a-repo\",\"targetRevision\":\"0.35.5\",\"helm\":{\"valueFiles\":[\"$values/infra/staging/kube-system/karpenter/values.yaml\"],\"passCredentials\":true},\"chart\":\"karpenter\"},\"srcRefs\":{\"$values\":
time="2024-08-05T11:11:02Z" level=info msg="manifest cache hit: &ApplicationSource{RepoURL:ghcr.io/a-repo,Path:,TargetRevision:0.35.5,Helm:&ApplicationSourceHelm{ValueFiles:[$values/infra/staging/kube-system/karpenter/values.yaml],Parameters:[]HelmParameter{},ReleaseName:,Values:,FileParameters:[]HelmFileParameter{},Version:,PassCredentials:true,IgnoreMissingValueFiles:false,SkipCrds:false,ValuesObject:nil,},Kustomize:nil,Directory:nil,Plugin:nil,Chart:karpenter,Ref:,}/0.35.5"
time="2024-08-05T11:11:02Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=GenerateManifest grpc.service=repository.RepoServerService grpc.start_time="2024-08-05T11:11:02Z" grpc.time_ms=16.011 span.kind=server system=grpc
time="2024-08-05T11:11:02Z" level=debug msg="Skipping manifest generation for ref only source for application: karpenter and ref values"
time="2024-08-05T11:11:02Z" level=debug msg="symbolic reference 'HEAD' (refs/heads/master) resolved to '9cc8e001f39a9a0bbb8c1dac28fc7cfa9c33722a'"
time="2024-08-05T11:11:02Z" level=info msg="finished unary call with code OK" grpc.code=OK grpc.method=GenerateManifest grpc.service=repository.RepoServerService grpc.start_time="2024-08-05T11:11:02Z" grpc.time_ms=0.581 span.kind=server system=grpc
carlosrodfern commented 2 months ago

I'm having the same issue on argocd version v2.10.5

adberger commented 2 months ago

Same here on v2.12.0.

Update: Checking out the main branch locally, the auto sync works!

carlosrodfern commented 2 months ago

I have confirmed that this issue does not happen in v2.9.21.

I'm afraid this optimization actually caused it, which was introduced in v2.10.0: https://github.com/argoproj/argo-cd/pull/16501

According to the PR:

When GenerateManifest is called with a ref only source manifest generation is still run, and since there are no manifests this is always considered a manifest cache miss causing an excess number of fetch requests to the git server

If I understand this correctly, it looks like the desire solution is to trigger manifest regeneration for the entire multi source repo when any of its sources changes, including the ones containing just values.yaml files which are depending on the helm chart repo for the complete manifest; but in order to avoid the repeated cache miss while checking the ref repos, there will need to be a new support for ref-only caching .

cc @nromriell

nromriell commented 2 months ago

I took a quick look at this and I wasn't able to reproduce, updates were getting picked up automatically for ref only sources. In the PR you link the call you see for newClientResolveRevision inside the short circuit branch calls the git ls remote to solve the new revision, and that call is still there on head.

What I do see is that the UI always shows the commit sha of only the first helm ref which I believe is by design, the second sha for the ref repo can be seen if you click into the sync status or by checking the status of the Application, but changes were happening automatically and I could see them in the cluster.

Tested against a locally built head (de53d8eb61ba995adc0e0832e0ccf2dc99d04b38) and the active v2.12.1 image. I pushed the reproduction environment to this branch https://github.com/nromriell/argocd-test-git-local-env/tree/debug-19382

The caching for refs here does follow the same timeout as other revisions set by ARGOCD_RECONCILIATION_TIMEOUT env var for the repo server or timeout.reconciliation from the helm chart deploy values. The changes I made though did help prevent a large number of those calls be made between that configured timeout, so if you have that value set high and you're now not getting changes fast enough you could try either reducing that configured time or setting up the git webhook to trigger updates with push events

carlosrodfern commented 2 months ago

I took a quick look at this and I wasn't able to reproduce, updates were getting picked up automatically for ref only sources. In the PR you link the call you see for newClientResolveRevision inside the short circuit branch calls the git ls remote to solve the new revision, and that call is still there on head.

What I do see is that the UI always shows the commit sha of only the first helm ref which I believe is by design, the second sha for the ref repo can be seen if you click into the sync status or by checking the status of the Application, but changes were happening automatically and I could see them in the cluster.

Tested against a locally built head (de53d8e) and the active v2.12.1 image. I pushed the reproduction environment to this branch https://github.com/nromriell/argocd-test-git-local-env/tree/debug-19382

The caching for refs here does follow the same timeout as other revisions set by ARGOCD_RECONCILIATION_TIMEOUT env var for the repo server or timeout.reconciliation from the helm chart deploy values. The changes I made though did help prevent a large number of those calls be made between that configured timeout, so if you have that value set high and you're now not getting changes fast enough you could try either reducing that configured time or setting up the git webhook to trigger updates with push events

Yup, it works. I just tested it again on v2.10.16 and v2.12.1. It just takes like 3-4 mins. Thank you for quickly clarifying and confirming this.