argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.08k stars 3.2k forks source link

Global Workflow variables not interpreted correctly in first step of workflow #11207

Open mp-az opened 1 year ago

mp-az commented 1 year ago

Pre-requisites

What happened/what you expected to happen?

When using Global Workflow variables in our Workflows the value of the variable is not interrupted by the first step of the Workflow. Instead the literal string from the Workflow template is returned. Subsequent steps in the workflow work ok and the value is returned in the step output as expected.

We have been running 3.1.0 for a long time and recently upgraded to 3.4.8 which is where we started to see the issue.

In our Workflow we define argument parameters as follows:

  arguments:
    parameters:
      - name: timestamp
        value: "{{workflow.creationTimestamp.Y}}.{{workflow.creationTimestamp.m}}.{{workflow.creationTimestamp.d}}-{{workflow.creationTimestamp.H}}{{workflow.creationTimestamp.M}}"
      - name: workdir
        value: "/mnt/data/{{workflow.parameters.model-name}}/{{workflow.parameters.timestamp}}"
      - name: model-name
        value: model-foo-bar

When the 'workdir' parameters are is used in Workflow steps we would expect the output to be /mnt/data/model-foo-bar/2023.06.13-0949. The timestamp portion built up in the timestamp parameter and then used in the workdir parameter

When used in the first step we get the following output when we use the workdir variable : /mnt/data/model-foo-bar/{{workflow.creationTimestamp.Y}}.{{workflow.creationTimestamp.m}}.{{workflow.creationTimestamp.d}}-{{workflow.creationTimestamp.H}}{{workflow.creationTimestamp.M}}`}}

In the next step and all subsequent steps the variable value is correctly returned as : {{/mnt/data/model-foo-bar/2023.06.13-0949}}

We have tried other built-in variables including workspace.namepsace and workspace.uid and experienced the same behaviour.

as mentioned this works Ok in 3.1.0 but we are can't get it to work in 3.4.8. It might be something in our wider configuration but we'd be grateful if you could help investigate the behaviour we are seeing.

Version

3.4.8

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: whale-say-parameter-debug
spec:
  entrypoint: hello-hello-hello
  arguments:
    parameters:
      - name: timestamp
        value: "{{workflow.creationTimestamp.Y}}.{{workflow.creationTimestamp.m}}.{{workflow.creationTimestamp.d}}-{{workflow.creationTimestamp.H}}{{workflow.creationTimestamp.M}}"
      - name: workdir
        value: "/mnt/data/{{workflow.parameters.model-name}}/{{workflow.parameters.timestamp}}"
      - name: model-name
        value: model-foo-bar

  # This spec contains two templates: hello-hello-hello and whalesay
  templates:
  - name: hello-hello-hello
    # Instead of just running a container
    # This template has a sequence of steps
    steps:
    - - name: hello1            # hello1 is run before the following steps
        template: whalesay
    - - name: hello2a           # double dash => run after previous step
        template: whalesay

  # This is the same template as from the previous example
  - name: whalesay
    container:
      image: docker/whalesay
      command: [cowsay]
      args: ["{{workflow.parameters.workdir}}"]

Logs from the workflow controller

time="2023-06-13T09:49:14.213Z" level=info msg="Processing workflow" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:14.219Z" level=info msg="Updated phase  -> Running" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:14.219Z" level=info msg="Steps node whale-say-parameter-debug initialized Running" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:14.219Z" level=info msg="StepGroup node whale-say-parameter-debug-2258923210 initialized Running" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:14.219Z" level=info msg="Pod node whale-say-parameter-debug-2673084999 initialized Pending" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:14.250Z" level=info msg="Created pod: whale-say-parameter-debug[0].hello1 (whale-say-parameter-debug-whalesay-2673084999)" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:14.250Z" level=info msg="Workflow step group node whale-say-parameter-debug-2258923210 not yet completed" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:14.251Z" level=info msg="TaskSet Reconciliation" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:14.251Z" level=info msg=reconcileAgentPod namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:14.263Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=769943899 workflow=whale-say-parameter-debug
time="2023-06-13T09:49:24.251Z" level=info msg="Processing workflow" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:24.251Z" level=info msg="Task-result reconciliation" namespace=default numObjs=0 workflow=whale-say-parameter-debug
time="2023-06-13T09:49:24.251Z" level=info msg="node changed" namespace=default new.message= new.phase=Succeeded new.progress=0/1 nodeID=whale-say-parameter-debug-2673084999 old.message= old.phase=Pending old.progress=0/1 workflow=whale-say-parameter-debug
time="2023-06-13T09:49:24.252Z" level=info msg="Step group node whale-say-parameter-debug-2258923210 successful" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:24.252Z" level=info msg="node whale-say-parameter-debug-2258923210 phase Running -> Succeeded" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:24.252Z" level=info msg="node whale-say-parameter-debug-2258923210 finished: 2023-06-13 09:49:24.252183688 +0000 UTC" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:24.252Z" level=info msg="StepGroup node whale-say-parameter-debug-2191665639 initialized Running" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:24.252Z" level=info msg="SG Outbound nodes of whale-say-parameter-debug-2673084999 are [whale-say-parameter-debug-2673084999]" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:24.252Z" level=info msg="Pod node whale-say-parameter-debug-3519223984 initialized Pending" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:24.272Z" level=info msg="Created pod: whale-say-parameter-debug[1].hello2a (whale-say-parameter-debug-whalesay-3519223984)" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:24.273Z" level=info msg="Workflow step group node whale-say-parameter-debug-2191665639 not yet completed" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:24.273Z" level=info msg="TaskSet Reconciliation" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:24.274Z" level=info msg=reconcileAgentPod namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:24.286Z" level=info msg="Workflow update successful" namespace=default phase=Running resourceVersion=769943998 workflow=whale-say-parameter-debug
time="2023-06-13T09:49:24.292Z" level=info msg="cleaning up pod" action=labelPodCompleted key=default/whale-say-parameter-debug-whalesay-2673084999/labelPodCompleted
time="2023-06-13T09:49:34.275Z" level=info msg="Processing workflow" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:34.276Z" level=info msg="Task-result reconciliation" namespace=default numObjs=0 workflow=whale-say-parameter-debug
time="2023-06-13T09:49:34.276Z" level=info msg="node changed" namespace=default new.message= new.phase=Succeeded new.progress=0/1 nodeID=whale-say-parameter-debug-3519223984 old.message= old.phase=Pending old.progress=0/1 workflow=whale-say-parameter-debug
time="2023-06-13T09:49:34.276Z" level=info msg="SG Outbound nodes of whale-say-parameter-debug-2673084999 are [whale-say-parameter-debug-2673084999]" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:34.276Z" level=info msg="Step group node whale-say-parameter-debug-2191665639 successful" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:34.276Z" level=info msg="node whale-say-parameter-debug-2191665639 phase Running -> Succeeded" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:34.277Z" level=info msg="node whale-say-parameter-debug-2191665639 finished: 2023-06-13 09:49:34.277002453 +0000 UTC" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:34.277Z" level=info msg="Outbound nodes of whale-say-parameter-debug-3519223984 is [whale-say-parameter-debug-3519223984]" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:34.277Z" level=info msg="Outbound nodes of whale-say-parameter-debug is [whale-say-parameter-debug-3519223984]" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:34.277Z" level=info msg="node whale-say-parameter-debug phase Running -> Succeeded" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:34.277Z" level=info msg="node whale-say-parameter-debug finished: 2023-06-13 09:49:34.277095556 +0000 UTC" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:34.277Z" level=info msg="Checking daemoned children of whale-say-parameter-debug" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:34.277Z" level=info msg="TaskSet Reconciliation" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:34.277Z" level=info msg=reconcileAgentPod namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:34.277Z" level=info msg="Updated phase Running -> Succeeded" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:34.277Z" level=info msg="Marking workflow completed" namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:34.277Z" level=info msg="Checking daemoned children of " namespace=default workflow=whale-say-parameter-debug
time="2023-06-13T09:49:34.282Z" level=info msg="cleaning up pod" action=deletePod key=default/whale-say-parameter-debug-1340600742-agent/deletePod
time="2023-06-13T09:49:34.293Z" level=info msg="Workflow update successful" namespace=default phase=Succeeded resourceVersion=769944101 workflow=whale-say-parameter-debug
time="2023-06-13T09:49:34.311Z" level=info msg="cleaning up pod" action=labelPodCompleted key=default/whale-say-parameter-debug-whalesay-3519223984/labelPodCompleted

Logs from in your workflow's wait container

W
time="2023-06-13T09:49:16 UTC" level=info msg="Starting Workflow Executor" version=v3.4.8
time="2023-06-13T09:49:16 UTC" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0
.5 Steps=5
time="2023-06-13T09:49:16 UTC" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC"
 includeScriptOutput=false namespace=default podName=whale-say-parameter-debug-whalesay-2673084999 template="
{\"name\":\"whalesay\",\"inputs\":{},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"
docker/whalesay\",\"command\":[\"cowsay\"],\"args\":[\"{{`/mnt/data/model-foo-bar/{{workflow.creationTimestam
p.Y}}.{{workflow.creationTimestamp.m}}.{{workflow.creationTimestamp.d}}-{{workflow.creationTimestamp.H}}{{wor
kflow.creationTimestamp.M}}`}}\"],\"resources\":{}}}" version="&Version{Version:v3.4.8,BuildDate:2023-05-25T2
2:19:57Z,GitCommit:9e27baee4b3be78bb662ffa5e3a06f8a6c28fb53,GitTag:v3.4.8,GitTreeState:clean,GoVersion:go1.20
.4,Compiler:gc,Platform:linux/amd64,}"
time="2023-06-13T09:49:16 UTC" level=info msg="Starting deadline monitor"
time="2023-06-13T09:49:20 UTC" level=info msg="Main container completed" error="<nil>"
time="2023-06-13T09:49:20 UTC" level=info msg="No Script output reference in workflow. Capturing script outpu
t ignored"
time="2023-06-13T09:49:20 UTC" level=info msg="No output parameters"
time="2023-06-13T09:49:20 UTC" level=info msg="No output artifacts"
time="2023-06-13T09:49:20 UTC" level=info msg="Alloc=7741 TotalAlloc=15824 Sys=23165 NumGC=5 Goroutines=7"

time="2023-06-13T09:49:26 UTC" level=info msg="Starting Workflow Executor" version=v3.4.8
time="2023-06-13T09:49:26 UTC" level=info msg="Using executor retry strategy" Duration=1s Factor=1.6 Jitter=0
.5 Steps=5
time="2023-06-13T09:49:26 UTC" level=info msg="Executor initialized" deadline="0001-01-01 00:00:00 +0000 UTC"
 includeScriptOutput=false namespace=default podName=whale-say-parameter-debug-whalesay-3519223984 template="
{\"name\":\"whalesay\",\"inputs\":{},\"outputs\":{},\"metadata\":{},\"container\":{\"name\":\"\",\"image\":\"
docker/whalesay\",\"command\":[\"cowsay\"],\"args\":[\"{{`/mnt/data/model-foo-bar/2023.06.13-0949`}}\"],\"res
ources\":{}}}" version="&Version{Version:v3.4.8,BuildDate:2023-05-25T22:19:57Z,GitCommit:9e27baee4b3be78bb662
ffa5e3a06f8a6c28fb53,GitTag:v3.4.8,GitTreeState:clean,GoVersion:go1.20.4,Compiler:gc,Platform:linux/amd64,}"
time="2023-06-13T09:49:26 UTC" level=info msg="Starting deadline monitor"
time="2023-06-13T09:49:30 UTC" level=info msg="Main container completed" error="<nil>"
time="2023-06-13T09:49:30 UTC" level=info msg="No Script output reference in workflow. Capturing script outpu
t ignored"
time="2023-06-13T09:49:30 UTC" level=info msg="No output parameters"
time="2023-06-13T09:49:30 UTC" level=info msg="No output artifacts"
time="2023-06-13T09:49:30 UTC" level=info msg="Alloc=9722 TotalAlloc=15819 Sys=23421 NumGC=4 Goroutines=7"
mp-az commented 1 year ago

If we introduce a new initial step into our workflow that simply suspends the workflow for 1s then the variables in the next step(s) all work as expected.

    - - name: delay
        template: delay

...
...
...

  - name: delay
    suspend:
      duration: "1"

This suggest a concurrency / timing issue perhaps ? Are there any known issues with variable values been made available for use in the workflow containers?

I have also tried 3.2.11 and 3.3.10 which demonstrate the same behaviour using both the docker and the emissary runtimes.

At the moment only 3.1.0 works (with both runtimes).

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

tooptoop4 commented 1 week ago

maybe related to https://github.com/argoproj/argo-workflows/issues/13799