Closed leesungbin closed 1 week ago
Fixes #13044
I've noticed that the maxDurationDeadline calculation process is unusual and this value is overriding executionDeadline.
maxDurationDeadline
executionDeadline
For example, with the following workflow, it waits for 120 seconds and exits with error code 10:
apiVersion: argoproj.io/v1alpha1 kind: Workflow metadata: generateName: retry-sample- spec: activeDeadlineSeconds: 3600 entrypoint: retry-example templates: - name: retry-example retryStrategy: limit: 2 backoff: duration: 10s factor: 2 maxDuration: 3m container: image: alpine command: [sh, -c] args: ["sleep 120; exit 10"]
But, when the first attempt fails, the second node's container has a deadline within 1 minute, and it is killed by its wait container.
The latest Argo workflow follows this timeline (not exact times, for easier understanding):
0s: firstChildNode started (deadline = 3600s) 120s: firstChildNode finished and exited with an error (deadline = firstChildNode.StartedAt + backoff.maxDuration = 180s) 130s: Waited for 10 seconds(backoff duration) & secondChildNode started (deadline: 50 seconds left) 180s: Deadline exceeded, wait container kills main container
maxDuration
lastChildNode
finishedTime
I've tested with the above workflow, and it is working as expected.
I'll close this PR (updating documentation about maxDuration will be enough.)
Superseded by #13068 docs PR
Fixes #13044
Motivation
I've noticed that the
maxDurationDeadline
calculation process is unusual and this value is overridingexecutionDeadline
.For example, with the following workflow, it waits for 120 seconds and exits with error code 10:
But, when the first attempt fails, the second node's container has a deadline within 1 minute, and it is killed by its wait container.
The latest Argo workflow follows this timeline (not exact times, for easier understanding):
Modifications
maxDuration
to firstChildNode's startTime.maxDurationDeadline
withlastChildNode
'sfinishedTime
.executionDeadline
was overridden bymaxDurationDeadline
.Verifications
I've tested with the above workflow, and it is working as expected.