argoproj / argo-cd

Declarative Continuous Deployment for Kubernetes
https://argo-cd.readthedocs.io
Apache License 2.0
17.76k stars 5.42k forks source link

Auto Sync terminate #16489

Open alexmt opened 11 months ago

alexmt commented 11 months ago

Summary

An ability to override "stuck" auto-sync operation. It is known that sync might stuck due to various reasons: sync job could not be complete due to "image pull backoff"; deployment cannot reach a healthy state due to failing readiness probe, etc. Ideally, it should be enough to fix the root cause and let Argo CD deploy new changes. However, currently, Argo CD is not going to give up on a first sync if new changes are detected.

Motivation

Preview environments. Argo CD application generated by Appset for a pull request might fail, because code in the PR might have issue. Engineer should be able to just fix bug in the code, push new change to the PR and see updated synced applciation.

Proposal

Introduce a syncPolicy.terminate setting that allows configuring automatic operation termination when based on the state of configured "problematic" resources.

Example below cancels automatic sync :

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: guestbook
spec:
  project: default
  source:
    repoURL: https://github.com/argoproj/argocd-example-apps.git
    path: guestbook

  syncPolicy:
    terminate:
         timeout: 10m # global timeout (see https://github.com/argoproj/argo-cd/issues/6055 )
         resource:
             - group: batch
                kind: Job
                name: upgrade-database-*
                status: Progressing
                timeout: 1m # sync should be terminated if `upgrade-database` sync hook stuck for longer than 1 minute
jannfis commented 11 months ago

I think the problem is rooted deeper. Retry in auto sync never picks up new changes from the source, and keeps iterating over the same target revision until the retry policy is exhausted.

With a progressing timeout, this will not cater for above use case. While termination of long progressing resources is also required (and a great idea btw), I think that Argo CD should check the source (and parameters in the Application) for changes and potentially re-start the sync upon such changes.

jannfis commented 11 months ago

Refer: https://github.com/argoproj/argo-cd/issues/12904

blakepettersson commented 10 months ago

Potentially (partially) addressed with #15603?

phyzical commented 4 weeks ago

@blakepettersson i dont think so as that is around retries where as this addresses the issue where you find syncs stuck at times like 20+ hours, or rather they still have not retried once

phyzical commented 4 weeks ago

@jannfis i guess the motivation based on the issue trail that led me here is for example,

sfynx commented 1 week ago

Yeah, this is something I'd need as well. We wish to control lifecycle of ApplicationSet-managed apps purely through Git, but right now we need to have a CI process in between which does a terminate op to any app that is currently syncing to prevent the eternal sync issue when things get stuck.