argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.94k stars 5.46k forks source link

ArgoCD does not recovered resources in deleting state during rollback #17484

Open tete17 opened 8 months ago

tete17 commented 8 months ago

Checklist:

Describe the bug

Let's say you are deploying a change that removed an old Deployment or workers that you don't need anymore. These pods have a long terminationGracePeriodSeconds and you are using the default deletion policy of ArgoCD of Forground. While you deploy your changes the worker Deployment switched to deleting state but you realize there is a bug in the new version and decide to rollback.

At this point while the sync is still happening you decide to press the rollback button but ArgoCD only performs a basic kubectl apply command. This unfortunately is not enough to cancel the deletion state of the resource and k8s stills ends up killing you deployment, leaving you with no workers and incident and having to write down postmortems :stuck_out_tongue_closed_eyes:

To Reproduce

Expected behavior

Ideally I would expect to be some k8s native way to issue a cancelled on deleting resource. I am not aware (although I haven't looked much into it) of such mechanism.

If no mechanism exist maybe ArgoCD can implement a bit of custom logic to check for resources being deleted but it may be too cumbersome.

Maybe we should switch the default deletion policy to the k8s default on of background. In this situation this would never happen as the deployment would immediately be deleted and recreated if need be.

Version

argocd: v2.10.2+fcf5d8c
  BuildDate: 2024-03-01T21:24:51Z
  GitCommit: fcf5d8c2381b68ab1621b90be63913b12cca2eb7
  GitTreeState: clean
  GoVersion: go1.21.3
  Compiler: gc
  Platform: linux/amd64
andrii-korotkov-verkada commented 4 days ago

ArgoCD versions 2.10 and below have reached EOL. Can you upgrade and let us know if the issue is still present, please?