argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
14.98k stars 3.19k forks source link

The execution of `failureCondition` is inconsistent with the docs #11217

Open KingsonKai opened 1 year ago

KingsonKai commented 1 year ago

Pre-requisites

What happened/what you expected to happen?

My failureCondition is defined as follows successCondition is phase == recover AND status == failed failureCondition is phase == recover AND status == success image

The actual execution result of the object is phase == recover AND status == success image

The official document describes that comma-separated writing is an AND relationship: image

In the workflow result, it seems that only one of the conditions is judged after execution. It seems that the execution logic of the judgment condition is an OR relationship, not the expected AND relationship. image

Version

v3.4.8

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

My resource object is a CRD, not a regular k8s resource type

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded
sarabala1979 commented 1 year ago

@KingsonKai can you provide the full workflow yaml which failed ?

KingsonKai commented 1 year ago
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: network-cpu-workflow
  namespace: chaosmeta
spec:
  serviceAccountName: chaosmeta-inject-controller-manager
  entrypoint: main
  templates:
    - name: experiment-inject
      inputs:
        parameters:
        - name: experiment
      resource:
        action: create
        failureCondition: status.status == failed
        successCondition: status.phase == recover,status.status == success
        manifest: "{{inputs.parameters.experiment}}"
    - name: raw-suspend
      inputs:
        parameters:
        - name: time
      suspend:
        duration: "{{inputs.parameters.time}}"
    - name: experiment-recover
      inputs:
        parameters:
        - name: experiment
      resource:
        action: patch
        failureCondition: status.phase == recover,status.status == failed
        successCondition: status.phase == recover,status.status == success
        mergeStrategy: json
        flags:
        - experiments.inject.chaosmeta.io
        - "{{inputs.parameters.experiment}}"
        manifest: |
          - op: replace
            path: /spec/targetPhase
            value: recover
    - name: main
      dag:
        tasks:
        - name: before-wait
          template: raw-suspend
          arguments:
            parameters: [{name: time, value: "5s"}]
        - name: pod-network-experiment-inject
          hooks:
            exit:
              template: experiment-recover
              arguments:
                parameters: [{name: experiment, value: "pod-network-experiment"}]
          dependencies: [before-wait]
          template: experiment-inject
          arguments:
            parameters: 
              - name: experiment
                value: |
                  apiVersion: inject.chaosmeta.io/v1alpha1
                  kind: Experiment
                  metadata:
                    name: pod-network-experiment
                    namespace: chaosmeta
                  spec:
                    scope: pod
                    targetPhase: inject
                    experiment:
                      target: network
                      fault: delay
                      duration: 5m
                      args:
                        - key: interface
                          value: 'eth0'
                          valueType: string
                        - key: latency
                          value: '2s'
                          valueType: string
                    selector:
                      - namespace: default
                        name:
                          - nginx-75c57b6fd7-hldwd
        - name: inject-wait
          dependencies: [before-wait]
          template: raw-suspend
          arguments:
            parameters: [{name: time, value: "30s"}]
        - name: inject-wait-2
          dependencies: [inject-wait]
          template: raw-suspend
          arguments:
            parameters: [{name: time, value: "5s"}]
        - name: after-wait
          dependencies: [pod-network-experiment-inject,inject-wait-2]
          template: raw-suspend
          arguments:
            parameters: [{name: time, value: "5s"}]
KingsonKai commented 1 year ago

this condition can't work: templates.experiment-recover.resource.failureCondition This is the CRD of the chaos engineering project chaosmeta. When I change the targetPhase to recover, the state will change, but the state of this CRD is composite, and two fields need to be judged at the same time.

If you want to reproduce completely, you need to install chaosmeta. It may be too much work for you. You can follow the chaosmeta documentation to install it with one click.

I feel that other CRDs with composite states should also have such bugs. If you have a ready-made CRD to try, please try to reproduce it

this is the target CRD:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  annotations:
    controller-gen.kubebuilder.io/version: v0.11.1
  name: experiments.inject.chaosmeta.io
spec:
  conversion:
    strategy: Webhook
    webhook:
      clientConfig:
        service:
          name: chaosmeta-inject-webhook-service
          namespace: chaosmeta
          path: /convert
      conversionReviewVersions:
        - v1
  group: inject.chaosmeta.io
  names:
    kind: Experiment
    listKind: ExperimentList
    plural: experiments
    singular: experiment
  scope: Namespaced
  versions:
    - name: v1alpha1
      schema:
        openAPIV3Schema:
          description: Experiment is the Schema for the experiments API
          properties:
            apiVersion:
              description: 'APIVersion defines the versioned schema of this representation of an object. Servers should convert recognized schemas to the latest internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
              type: string
            kind:
              description: 'Kind is a string value representing the REST resource this object represents. Servers may infer this from the endpoint the client submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
              type: string
            metadata:
              type: object
            spec:
              description: ExperimentSpec defines the desired state of Experiment
              properties:
                experiment:
                  properties:
                    args:
                      items:
                        properties:
                          key:
                            type: string
                          value:
                            type: string
                          valueType:
                            type: string
                        required:
                          - key
                          - value
                        type: object
                      type: array
                    duration:
                      description: Duration support "h", "m", "s"
                      type: string
                    fault:
                      type: string
                    target:
                      type: string
                  required:
                    - fault
                    - target
                  type: object
                rangeMode:
                  properties:
                    type:
                      description: 'Type Optional: all、percent、count'
                      type: string
                    value:
                      type: integer
                  required:
                    - type
                  type: object
                scope:
                  description: 'Scope Optional: node, pod. type of experiment object'
                  type: string
                selector:
                  description: Selector The internal part of unit is "AND", and the external part is "OR" and de-duplication
                  items:
                    properties:
                      ip:
                        items:
                          type: string
                        type: array
                      label:
                        additionalProperties:
                          type: string
                        type: object
                      name:
                        items:
                          type: string
                        type: array
                      namespace:
                        type: string
                    type: object
                  type: array
                targetPhase:
                  type: string
              required:
                - experiment
                - scope
                - targetPhase
              type: object
            status:
              description: ExperimentStatus defines the observed state of Experiment
              properties:
                createTime:
                  type: string
                detail:
                  properties:
                    inject:
                      items:
                        properties:
                          backup:
                            type: string
                          injectObjectName:
                            type: string
                          message:
                            type: string
                          startTime:
                            type: string
                          status:
                            type: string
                          uid:
                            description: InjectObjectInfo string     `json:"injectObjectInfo,omitempty"`
                            type: string
                          updateTime:
                            type: string
                        type: object
                      type: array
                    recover:
                      items:
                        properties:
                          backup:
                            type: string
                          injectObjectName:
                            type: string
                          message:
                            type: string
                          startTime:
                            type: string
                          status:
                            type: string
                          uid:
                            description: InjectObjectInfo string     `json:"injectObjectInfo,omitempty"`
                            type: string
                          updateTime:
                            type: string
                        type: object
                      type: array
                  type: object
                message:
                  type: string
                phase:
                  type: string
                status:
                  type: string
                updateTime:
                  type: string
              required:
                - createTime
                - detail
                - message
                - phase
                - status
                - updateTime
              type: object
          type: object
      served: true
      storage: true
      subresources:
        status: {}
terrytangyuan commented 1 year ago

It’s label selector basically https://github.com/argoproj/argo-workflows/blob/master/workflow/executor/resource.go#L186

but complicated use cases may not be supported yet

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

agilgur5 commented 1 year ago

Hmm https://github.com/argoproj/argo-workflows/issues/7143#issuecomment-961191024 says that failureCondition may indeed be different

KingsonKai commented 10 months ago

Does it mean that the two conditions separated by commas are "AND" logic in successCondition, but not in failureCondition?

agilgur5 commented 10 months ago

Yes, successCondition seems to be an "AND" while failureCondition is an "OR". The match code linked from the above comment is indeed different between the two.

Would you like to file a PR to fix the docs?