The suggested pattern of setting a Workflow level retryStrategy to tolerate pod deletion has unexpected side effects. If the workflow fails on an OnError step, after hitting the top level limit, the behavior will run the entire workflow again! Is this desirable or intended? https://argoproj.github.io/argo-workflows/tolerating-pod-deletion/
I would suggest a change or warning to the documentation as to this behavior or change the behavior to only cascade to child steps and not retry the entire workflow.
Summary
The suggested pattern of setting a
Workflow
levelretryStrategy
to tolerate pod deletion has unexpected side effects. If the workflow fails on anOnError
step, after hitting the top level limit, the behavior will run the entire workflow again! Is this desirable or intended? https://argoproj.github.io/argo-workflows/tolerating-pod-deletion/I would suggest a change or warning to the documentation as to this behavior or change the behavior to only cascade to child steps and not retry the entire workflow.
Use Cases
https://argoproj.github.io/argo-workflows/tolerating-pod-deletion/
Reproducing
Argo v2.12.3
Submit this workflow and delete
first-sleep-2
twice. The entire workflow will run again.Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.