argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.11k stars 3.21k forks source link

fix: ensure that nodes complete when workflow fails with `parallelism` and `failFast`. Fixes #13806 #13827

Open jswxstw opened 3 weeks ago

jswxstw commented 3 weeks ago

Fixes #13806

Motivation

The FailFast feature has two serious flaws:

Modifications

Verification

local test and e2e tests

Test workflow:

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: demo
spec:
  entrypoint: main
  templates:
    - name: main
      parallelism: 2
      failFast: true
      steps:
        - - name: step0
            template: sleep
        - - name: step1
            template: fail
          - name: step2
            template: sleep
    - name: fail
      container:
        image: alpine:latest
        command: [sh, -c]
        args: ["exit 1"]
    - name: sleep
      container:
        image: alpine:latest
        command: [ sh, -c ]
        args: [ "sleep 5" ]

demo

agilgur5 commented 3 weeks ago

Have you seen #10312 and #11992?

jswxstw commented 3 weeks ago

Have you seen #10312 and #11992?

@agilgur5 Not yet, but #10312 is not related to this issue, failFast in these two issues is not the same, and the problems caused are also different: