argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.42k stars 5.29k forks source link

ApplicationSet controller infinite renconcile loop and wasting CPU resources #19675

Open joliveirinha opened 3 weeks ago

joliveirinha commented 3 weeks ago

Checklist:

Describe the bug

The applicationset controller keeps looping on reconciling logic, even tought no actual changes exist on the applications itself, or the appset. Checking the appset resources, the only thing that changes is the order of resources statuses on the status field of the appset itself. This normally happens only to a couple of resources.

I am new to argocd, but looking at the appset code, it seems that every time the reconciler is called by the controller-runtime, that it always updates the Status field, even if no change was done. Since the status is obtained from a Map, then the order may be different, which when written, will cause a new resource update. This then, triggers a new renconcile call by the controller-runtime since the Appset CRD was changed. This results in an infinite loop.

I guess, the reason this is not more visibile is because it probably only occurs with appsets that generate a lot of applications, thus increasing the probability of resources statuses reorder.

This bug, results in appset-controller always spending ~10% of a single CPU.

Since I am new to argocd and just testing this on my homelab, I am not sure what would be the ideal fix, but I tried adding a fix to the setup of the appset controller on the controller runtime, that ignores changes on the appset that don't really update any field on the spec. To me this makes sense, but the mantainers should know better. Doing so, fixed the issue for me.

To Reproduce

Just crate a a normal application set that generates multiple applications. In my case, I have two appsets where 1 generates 16 applications and the other 7.

What I noted is that each the one that generates only 7 only started showing this behaviour after I went from having 3 applications to 7. This makes sense to me because the probability of a change of order one the status resources map changes when more applications are generated by the appset.

Expected behavior

ApplicationSet controller should only trigger on relevant changes that does require the reconciler to trigger.

Version

❯ argocd version
argocd: v2.12.0+ec30a48
  BuildDate: 2024-08-05T15:31:14Z
  GitCommit: ec30a48bce7a60046836e481cd2160e28c59231d
  GitTreeState: clean
  GoVersion: go1.22.5
  Compiler: gc
  Platform: darwin/arm64
argocd-server: v2.12.2+560953c
  BuildDate: 2024-08-23T03:30:19Z
  GitCommit: 560953c37b343c956f3a18f3db7d006e694c0dc4
  GitTreeState: clean
  GoVersion: go1.22.4
  Compiler: gc
  Platform: linux/amd64
  Kustomize Version: v5.4.2 2024-05-22T15:19:38Z
  Helm Version: v3.15.2+g1a500d5
  Kubectl Version: v0.29.6
  Jsonnet Version: v0.20.0

Logs

core-argocd-applicationset-controller-5b7cb6957b-x7zbz time="2024-08-24T15:35:26Z" level=info msg="end reconcile" applicationset=argocd/core requeueAfter=3m0s                                                     │
│ core-argocd-applicationset-controller-5b7cb6957b-x7zbz time="2024-08-24T15:35:26Z" level=info msg="applications result from the repo service" allPaths="[backups backups/grafana_dash bootstrap bootstrap/cilium b │
│ core-argocd-applicationset-controller-5b7cb6957b-x7zbz time="2024-08-24T15:35:26Z" level=info msg="generated 9 applications" applicationset=argocd/core generator="{nil nil &GitGenerator{RepoURL:https://github.c │
│ core-argocd-applicationset-controller-5b7cb6957b-x7zbz time="2024-08-24T15:35:27Z" level=info msg="end reconcile" applicationset=argocd/core requeueAfter=3m0s                                                     │
│ core-argocd-applicationset-controller-5b7cb6957b-x7zbz time="2024-08-24T15:35:27Z" level=info msg="applications result from the repo service" allPaths="[backups backups/grafana_dash bootstrap bootstrap/cilium b │
│ core-argocd-applicationset-controller-5b7cb6957b-x7zbz time="2024-08-24T15:35:27Z" level=info msg="generated 9 applications" applicationset=argocd/core generator="{nil nil &GitGenerator{RepoURL:https://github.c │
│ core-argocd-applicationset-controller-5b7cb6957b-x7zbz time="2024-08-24T15:35:28Z" level=info msg="end reconcile" applicationset=argocd/apps requeueAfter=3m0s                                                     │
│ core-argocd-applicationset-controller-5b7cb6957b-x7zbz time="2024-08-24T15:35:28Z" level=info msg="applications result from the repo service" allPaths="[backups backups/grafana_dash bootstrap bootstrap/cilium b │
│ core-argocd-applicationset-controller-5b7cb6957b-x7zbz time="2024-08-24T15:35:28Z" level=info msg="generated 16 applications" applicationset=argocd/apps generator="{nil nil &GitGenerator{RepoURL:https://github. │
│ core-argocd-applicationset-controller-5b7cb6957b-x7zbz time="2024-08-24T15:35:28Z" level=info msg="end reconcile" applicationset=argocd/core requeueAfter=3m0s                                                     │
│ core-argocd-applicationset-controller-5b7cb6957b-x7zbz time="2024-08-24T15:35:28Z" level=info msg="applications result from the repo service" allPaths="[backups backups/grafana_dash bootstrap bootstrap/cilium b │
│ core-argocd-applicationset-controller-5b7cb6957b-x7zbz time="2024-08-24T15:35:28Z" level=info msg="generated 9 applications" applicationset=argocd/core generator="{nil nil &GitGenerator{RepoURL:https://github.c │

I truncated the lines. But it is clearly seen that it keeps triggering every ~1 second which is the time it takes for each reconcile.

YXL76 commented 3 weeks ago

It happens on 2.12.x, but resolved after downgrading to 2.11.7.

jblsk commented 2 weeks ago

we have reconcile setting requeueAfter set, but the applicatonset-contoller spams our logs every second anyway


 {"applicationset":{"Namespace":"argocd","Name":"bcde-apps"},"level":"info","msg":"end reconcile","requeueAfter":180000000000,"time":"2024-08-27T09:50:07Z"}
dntosas commented 2 weeks ago

happens also on 2.12.3 ^

pre commented 3 days ago

We upgraded to v2.12.3 recently from v2.10.x and we see the same issue.

In v2.12.3 argocd-applicationset-controller constantly

I checked the logs from v2.10.x and it seems this problem did not occur with the older version.