Closed tico24 closed 1 year ago
The default retryPolicy is OnFailure
.
Setting this to Always
does what you'd wish.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: designed-to-error-
labels:
workflows.argoproj.io/archive-strategy: "false"
spec:
entrypoint: test
templates:
- name: test
retryStrategy:
limit: "3"
expression: lastRetry.status == "Error"
retryPolicy: Always
container:
image: ubuntu
command:
- sh
- -c
- |
sleep 5000000
I think the presence of an expression should override retryPolicy so that the result of the expression evaluation is all that matters for retries. Unless someone disagrees soon, I'll write a PR for that.
@Joibel I think the eventual retry behavior should be the AND result of expression
and retryPolicy
. which means we should have the following rules:
expression
evaluates to false, then the retryPolicy
will be ignored and the node will not be retriedexpression
evaluates to true, then depends on the retryPolicy
Let's take a more obvious example: what's the expected behavior for the below Workflow?
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: designed-to-fail-
labels:
workflows.argoproj.io/archive-strategy: "false"
spec:
entrypoint: test
templates:
- name: test
retryStrategy:
limit: "3"
expression: lastRetry.status == "Failed"
retryPolicy: OnError
container:
image: ubuntu
command:
- sh
- -c
- |
exit 64
The current actual behavior is it won't retry and I think it's expected.
So back to the original issue, If we want to retry when lastRetry.status == "Error"
, so we can just set retryPlicy to OnError or Always.
@maxsxu I disagree. I would expect retryPolicy
to be ignored in the presence of an expression because the expression is more 'expressive'. This is the natural understanding of at least one other company (who I raised the issue on behalf of).
The workflow tico24 posted isn't obviously never going to retry. By reading it you'd assume it will behave well. Perhaps I'd accept that in the presence of both expression
and and explicit retryPolicy
we may never retry, but if I read
spec:
entrypoint: test
templates:
- name: test
retryStrategy:
limit: "3"
expression: lastRetry.status == "Error"
container:
image: ubuntu
command:
- sh
- -c
- |
sleep 5000000
I really do NOT expect that to be absolutely identical to having told the system never to retry. We need the language to be clear, and right now it isn't.
If I was able to change everything I would forbid having both expression
and retryPolicy
in the same retryStrategy
. I was proposing making retryPolicy
do nothing, but perhaps a lesser change is to make it not do anything unless explicitly set.
Ideally @maxsxu's example
spec:
entrypoint: test
templates:
- name: test
retryStrategy:
limit: "3"
expression: lastRetry.status == "Failed"
retryPolicy: OnError
would be invalid by that, but as second best and so as not to break existing workflows it would explain to you that you've just written something that would never retry, but in the general case that's not possible.
So how about making the rule that, in the presence of an expression and NO explicit retryPolicy
then the retryPolicy
is effectively Always
(e.g. the expression makes the decision). That fixes the original complaint and is even less of a change.
@Joibel Thanks for your thoughtful options.
So how about making the rule that, in the presence of an expression and NO explicit retryPolicy then the retryPolicy is effectively Always (e.g. the expression makes the decision). That fixes the original complaint and is even less of a change.
I'm OK with that. i.e, changing the default retryPolicy to Always.
@simster7 Would also like to know your thoughts as you're the original author.
For clarity, I'm not proposing changing the default policy to be Always
in the absence of an expression
. In pseudocode my proposal is:
if expression == "" {
defaultPolicy = OnFailure
} else {
defaultPolicy = Always
}
In order to maintain compatibility with existing behaviour.
@Joibel I see, more accurately, I suppose you're proposing following. Did I catch you right?
if retryPolicy == "" {
if expression == "" {
defaultPolicy = OnFailure
} else {
defaultPolicy = Always
}
}
Yes, that's what I'm proposing.
Yes, that's what I'm proposing.
Thanks for clarifying, It makes sense.
I've no objection to this. Let's see what other people think about it.
As someone who tried to use this feature and only based on the documentation without checking the source code, my expectation was that if an expression
is provided it should always be considered regardless of the retryPolicy
. In other words, it should overwrite the retryPolicy
.
My reasoning is that the expression
is the more advanced/customizable option that provides all the options that are provided by the retryPolicy
. Having both at the same time can create conflicting cases. Why would one still need the retryPolicy
if they set up an expression?
That said, I understand that you need to keep this as is for backward compatibility.
Pre-requisites
:latest
What happened/what you expected to happen?
When a workflow node errors and it has a retryStrategy expression looking for lastRetry.status, the retryStrategy is not envoked.
This workflow runs as expected. It is retried 3 times becuse the workflow node fails:
However, if you run the following workflow, wait for the node to be running in the UI and then kill the pod using kubectl. We see that the node is marked as Errored, but the retryStrategy did not kick in, the node was not retried:
Version
v3.4.5 > latest
Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.
Logs from the workflow controller
Logs from in your workflow's wait container