argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.92k stars 5.46k forks source link

application-controller hung mid-processing #18334

Open acelinkio opened 5 months ago

acelinkio commented 5 months ago

Checklist:

Describe the bug argocd-application-controller stopped processing workloads entirely without crashing. The controller appeared to be hung on one of the ArgoCD Applications when it stopped functioning. That Application's custom resource had an Operations field that had been added but did not process.

During this time the application-controller produced no new log messages and did not update any applications. 30 minutes elapsed with no new information. Restarting the statefulset appeared to resolve the issue, however I am concerned about this reoccuring.

To Reproduce Was unable to reproduce on demand.

The application that appeared to cause the hang manages >100 Kubernetes objects 25 x Namespace 25 x Secret 25 x ApplicationSet (Each of these applicationsets spawns 2-5 child applications) 5 x Application (standalone)

Expected behavior

argocd-application-controller does not freeze. In the event of a freeze, I expect:

Screenshots n/a

Version

argocd@argocd-application-controller-0:~$ argocd version
argocd: v2.10.5+335875d
  BuildDate: 2024-03-28T15:02:45Z
  GitCommit: 335875d13e018bed6e03873f4742582582964745
  GitTreeState: clean
  GoVersion: go1.21.3
  Compiler: gc
  Platform: linux/amd64

Deployed using argo-cd helm chart version 6.7.8 using subchart for creating a 3 node redis cluster.

Logs

Did not see any relevant logs. No new logs were produced when the application was hung. All processing appeared to stop on inside of the container.

Additional Comments One concern that comes to mind is if the Kubernetes object the application-controller is trying to manage is too large and being rejected by the Kubernetes api however I did not see anything in Kubernetes logs or inside of the application-controller to indicate that is the case.

phclark commented 3 weeks ago

We're experiencing this issue as well. Restarting the statefulset resolves the immediate issue, but we'd love to understand why the controller is getting stuck and how to ensure it either recovers gracefully or restart the process automatically

andrii-korotkov-verkada commented 3 days ago

ArgoCD versions 2.10 and below have reached EOL. Can you upgrade and tell us if the issue is still present, please?