argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.07k stars 3.2k forks source link

`podSpecPatch` doesn't evaluate `retries` when defined at workflow spec #13123

Open tczhao opened 5 months ago

tczhao commented 5 months ago

Pre-requisites

What happened/what did you expect to happen?

I expect podSpecPatch to behave the same when defined at the workflow or template level.

At template level, sprig is evaluated, but not at the workflow level

Version

latest(a514a42)

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

metadata:
  name: sparkly-rhino
  namespace: argo
  labels:
    example: 'true'
spec:
  # move this to under the template, you will see sprig is evaluated
  podSpecPatch: |
    affinity:
      nodeAffinity:
        preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 10
          preference:
            matchExpressions:
            - key: eks.amazonaws.com/capacityType
              operator: NotIn
              values:
              - "{{=sprig.ternary('', 'SPOT', retries == '0')}}"
  arguments:
    parameters:
      - name: message
        value: hello argo
  entrypoint: argosay
  templates:
    - name: argosay
      inputs:
        parameters:
          - name: message
            value: '{{workflow.parameters.message}}'
      container:
        name: main
        image: argoproj/argosay:v2
        command:
          - /argosay
        args:
          - echo
          - '{{inputs.parameters.message}}'

Logs from the workflow controller

kubectl logs -n argo deploy/workflow-controller | grep ${workflow}

Logs from in your workflow's wait container

kubectl logs -n argo -c wait -l workflows.argoproj.io/workflow=${workflow},workflow.argoproj.io/phase!=Succeeded
jswxstw commented 5 months ago

sprig is not evaluated even though I move the podSpecPatch under the template, forgot to set retryStrategy to it?

https://github.com/argoproj/argo-workflows/blob/fce51dfd0d49c521102a2c1a8d661fab742578a5/workflow/controller/workflowpod.go#L363-L368

The workflow-level podSpecPatch is applied here and there is only a variable LocalVarPodName in localParams, so expression contains other step-level variables can not be evaluated properly.

tczhao commented 5 months ago

Thank you @jswxstw If I'm correct, it seems that pod level podSpecPatch's ProcessArgs(...) is always a superset of the wf spec level podSpecPatch's ProcessArgs(...) Does it make sense to apply all podSpecPatch first then ProcessArgs?

jswxstw commented 5 months ago

Does it make sense to apply all podSpecPatch first then ProcessArgs?

Applying podSpecPatch needs to be done during the definition of the pod spec, while ProcessArgs can only be completed before that. Alternatively, merging the workflow/template level podSpecPatch in advance, but this will encounter merging issue like #12476. In my opinion, it seems reasonable that the workflow level podSpecPatch can only access globalParams. After all, there may be differences in localParams between different templates. For example, it cannot guarantee that the variable retries will always exist.

tczhao commented 5 months ago

Thanks @jswxstw However, I think if we can have global retryStrategy, we should have a way to configure related parameters globally. I've made a simple draft, I think this approach is sufficient https://github.com/argoproj/argo-workflows/pull/13139/commits/5518def3bf7db14eed6596f39052efb12021792e

tczhao commented 5 months ago

ok, I found out to make pod level parameter available in a global defined podSpecPatch, we need to specify it under templateDefaults,

spec:
  templateDefaults:
    podSpecPatch: |
      affinity:
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 10
            preference:
              matchExpressions:
              - key: eks.amazonaws.com/capacityType
                operator: NotIn
                values:
                - "{{=sprig.ternary('', 'SPOT', retries == '0')}}"

or in controller configmap

  workflowDefaults: |
    spec:
      templateDefaults:
        podSpecPatch: |