argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.72k stars 5.4k forks source link

Current Sync Status "Unknown", App Condition "Error" - unable to terminate or update to new Git hashref #15489

Open HariSekhon opened 1 year ago

HariSekhon commented 1 year ago

Checklist:

Describe the bug

ArgoCD got stuck trying to sync and app, and gives no option to terminate the sync and doesn't pick up the newer git commit which should fix the issue.

This was caused by a kustomize build error in the app's yaml manifest directory, but fixing it in git hasn't helped as Argo is stuck.

To Reproduce

Create any kustomize overlay with a duplicate resource that causes kustomize build --enable-helm to fail such as defining the namespace in overlay.

rpc error: code = Unknown desc = Manifest generation error (cached): `kustomize build .external-secrets/us-dev --enable-helm` failed exit status 1: Error: accumulating resources: accumulation err='accumulating resources from '../base-us': '.external-secrets/base-us' must resolve to a file': recursed merging from path '.external-secrets/base-us': may not add resource with an already registered id: Namespace.v1.[noGrp]/external-secrets.[noNs]

The UI gives no option to terminate Sync, so I've tried the CLI without success too:

$ argocd app terminate-op external-secrets-us-dev
FATA[0000] rpc error: code = InvalidArgument desc = Unable to terminate operation. No operation is in progress 

I've killed all the argocd pods but they respawn back into the same state.

Btw this is running 3 application controllers in HA mode with env var ARGOCD_CONTROLLER_REPLICAS=3 too. I suspect this is why the state persists even when I killed all 3 pods at the same time.

Deleting the app in a non-cascading delete also didn't work, it came back with the same stuck sync state (I'm using app-of-apps pattern which recreated the app).

Deleting the app in a foreground delete to wipe out the contents also didn't work, it just got stuck deleting.

Expected behavior

Expected it to fail or at least time out eventually, then continue to pick up new Git commits and succeed with those newer versions, not get completely stuck.

Screenshots

Screenshot 2023-09-13 at 18 14 12 Screenshot 2023-09-13 at 18 20 19 Screenshot 2023-09-14 at 15 53 36

Version

argocd version
argocd: v2.7.1+5e54351
  BuildDate: 2023-05-02T16:54:25Z
  GitCommit: 5e543518dbdb5384fa61c938ce3e045b4c5be325
  GitTreeState: clean
  GoVersion: go1.19.8
  Compiler: gc
  Platform: darwin/amd64
argocd-server: v2.6.7+5bcd846
  BuildDate: 2023-03-23T14:57:27Z
  GitCommit: 5bcd846fa16e4b19d8f477de7da50ec0aef320e5
  GitTreeState: clean
  GoVersion: go1.18.10
  Compiler: gc
  Platform: linux/amd64
  Kustomize Version: v4.5.7 2022-08-02T16:35:54Z
  Helm Version: v3.10.3+g835b733
  Kubectl Version: v0.24.2
  Jsonnet Version: v0.19.1
HariSekhon commented 1 year ago

I believe ArgoCD needs some time limit on attempting to Sync after which the thread and state should be killed and reset to avoid being stuck indefinitely in this state.

valkiriaaquatica commented 1 month ago

I believe ArgoCD needs some time limit on attempting to Sync after which the thread and state should be killed and reset to avoid being stuck indefinitely in this state.

same issue here, using argocd v2.13.0+aa990d6 with go version go1.22.6 on Kubernetes