argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.08k stars 3.2k forks source link

Retry strategy with `withParam` context could only retry failed ones, not succeded ones #9190

Open baranbartu opened 2 years ago

baranbartu commented 2 years ago

Summary

If a template has retryStrategy and if one of the steps of the template has withParam, the following situation happens;

Let's say, we have 4 items eg. projectIDs: [1,2,3,4], and we use withParam to make them concurrently run. 3/4 of tasks succeeded and 1/4 of tasks failed. The main template will retry this step again but with all the projectIDs, including previously successful ones.

If it'd retry only failed ones, then it would make more sense, IMO.

An example piece of template;

    - name: deploy-projects-with-failover
      retryStrategy:
        limit: 3
        backoff:
          duration: "1m"
          factor: 1
      inputs:
        parameters:
          - name: projectIDs
      steps:
        - - name: deploy-project
            template: deploy-project
            withParam: "{{inputs.parameters.projectIDs}}"
            arguments:
              parameters:
                - name: projectID
                  value: "{{item}}"

Use Cases

I wouldn't want successful tasks to run over and over again. Currently, I am putting extra conditions to prevent any error like if the project is deployed, skip it Otherwise, it retries to run deploy-project with a projectID that was already installed.

PS. I am aware that the deploy-project is not idempotent already, but I specifically would like to focus on possible improvements of argo workflows.


Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.

sarabala1979 commented 2 years ago

@baranbartu Good improvement. We can add a strategy on retrying whether it needs to retry all or only failed. Do you like to contribute to this enhancement?

baranbartu commented 2 years ago

Hey @sarabala1979, yes I definitely would like to contribute!

jielou commented 9 months ago

any update on this feature? it does not make sense to retry the success step by default.

jielou commented 9 months ago

any update on this feature? it does not make sense to retry the success step by default.

it seems like the succeeded pod won't rerun...just the dag shows it seems like to be rerun.