argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.07k stars 3.2k forks source link

Circuit Breakers for CronWorkflows #7291

Open simster7 opened 2 years ago

simster7 commented 2 years ago

Summary

What change needs making?

Implement a circuit breaker for cron workflows. For example, if a workflow has had multiple failures in a row, pause the workflow automatically. Or pause the workflow after a certain number of executions (https://github.com/argoproj/argo-workflows/issues/7201).

Either an expression:

apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
metadata:
  name: hello-world
spec:
  schedule: "* * * * *"
  backOff: "{{cronworkflow.failures}} > 3"

Or a field:

apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
metadata:
  name: hello-world
spec:
  schedule: "* * * * *"
  backOff:
    failures: 2

Use Cases

When would you use this?

Cron workflows are the only CRD that act on their own (i.e. no explicit action is needed to spawn workflows from cron workflows), so a circuit breaker could be useful.


Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.

terrytangyuan commented 2 years ago

Duplicate of https://github.com/argoproj/argo-workflows/issues/5659?

simster7 commented 2 years ago

More like an extension

simster7 commented 2 years ago

@alexec Any thoughts on the spec proposal or this feature overall? Want to make sure we are all aligned before implementation begins.

alexec commented 2 years ago

I think the concept is fine. I'm not sure many people are asking for it, so I don't see the core team working on it soon.

tooptoop4 commented 1 month ago

@eduardodbr since https://github.com/argoproj/argo-workflows/pull/12305 was merged do u think this issue can be closed?

eduardodbr commented 3 weeks ago

I believe it can be closed. @agilgur5 do you agree?

agilgur5 commented 3 weeks ago

Per https://github.com/argoproj/argo-workflows/issues/5659#issuecomment-2381628641, I'm not sure either of these two were resolved, since they both specifically say "pause" and "suspend", which is intentionally different from stopStrategy per https://github.com/argoproj/argo-workflows/pull/12696#discussion_r1516670248