argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
14.61k stars 3.13k forks source link

Workflow of workflows fails when input parameters are specified on inner workflow #12634

Open flaviuvadan opened 5 months ago

flaviuvadan commented 5 months ago

Pre-requisites

What happened/what did you expect to happen?

Context My organization has a use case for workflows of workflows. However, when we create inner workflows via the resource specification the Argo Server fails to resolve {{input.parameters.whatever}}. The outer workflow template does not have any inputs specified while the inner workflows (specified as resources) have inputs + arguments specified (via steps).

Expected I expected the workflow to be submitted successfully since inner workflows submitted as resources should be independent of the outer workflow.

What actually happened I fail to submit the workflow with an error like

`templates.main.steps[0].w1r templates.w1r: failed to resolve {{inputs.parameters.msg}}`

Extra context The workflow YAML file is created via Hera. I went through Hera's code and could not find anything that generates anything unnecessarily. In addition, I verified the YAML, looked through Argo docs + examples. Since I cannot even submit this workflow I cannot get kubectl logs.

Hera discussion on the topic: https://github.com/argoproj-labs/hera/discussions/803

Hera specification

from hera.workflows import Workflow, Steps, script, Resource

@script(image="python:3.7")
def foo(msg):
    print(msg)

with Workflow(name="foo1", entrypoint="main") as w1:
    with Steps(name="main"):
        foo(name="foo1", arguments={"msg": "Hello, World!"})

with Workflow(name="foo2", entrypoint="main") as w2:
    with Steps(name="main"):
        foo(name="foo1", arguments={"msg": "Hello, World!"})

with Workflow(name="main", entrypoint="main") as w:
    w1r = Resource(name="w1r", manifest=w1, action="create")
    w2r = Resource(name="w2r", manifest=w2, action="create")

    with Steps(name="main"):
        w1r()
        w2r()

w.create()

Version

v3.3.3

Paste a small workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflows that uses private images.

apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  name: main
  namespace: default
spec:
  entrypoint: main
  templates:
  - name: w1r
    resource:
      action: create
      manifest: "apiVersion: argoproj.io/v1alpha1\nkind: Workflow\nmetadata:\n  name:\
        \ foo1\n  namespace: default\nspec:\n  entrypoint: main\n  templates:\n  -\
        \ name: main\n    steps:\n    - - arguments:\n          parameters:\n    \
        \      - name: msg\n            value: Hello, World!\n        name: foo1\n\
        \        template: foo\n  - inputs:\n      parameters:\n      - name: msg\n\
        \    name: foo\n    script:\n      command:\n      - python\n      image:\
        \ python:3.7\n      source: 'import os\n\n        import sys\n\n        sys.path.append(os.getcwd())\n\
        \n        import json\n\n        try: msg = json.loads(r'''''{{inputs.parameters.msg}}''''')\n\
        \n        except: msg = r'''''{{inputs.parameters.msg}}'''''\n\n\n       \
        \ print(msg)'\n"
  - name: w2r
    resource:
      action: create
      manifest: "apiVersion: argoproj.io/v1alpha1\nkind: Workflow\nmetadata:\n  name:\
        \ foo2\n  namespace: default\nspec:\n  entrypoint: main\n  templates:\n  -\
        \ name: main\n    steps:\n    - - arguments:\n          parameters:\n    \
        \      - name: msg\n            value: Hello, World!\n        name: foo1\n\
        \        template: foo\n  - inputs:\n      parameters:\n      - name: msg\n\
        \    name: foo\n    script:\n      command:\n      - python\n      image:\
        \ python:3.7\n      source: 'import os\n\n        import sys\n\n        sys.path.append(os.getcwd())\n\
        \n        import json\n\n        try: msg = json.loads(r'''''{{inputs.parameters.msg}}''''')\n\
        \n        except: msg = r'''''{{inputs.parameters.msg}}'''''\n\n\n       \
        \ print(msg)'\n"
  - name: main
    steps:
    - - name: w1r
        template: w1r
    - - name: w2r
        template: w2r

Logs from the workflow controller

NOT APPLICABLE / cannot submit workflow

Logs from in your workflow's wait container

NOT APPLICABLE / cannot submit workflow
agilgur5 commented 5 months ago

However, when we create inner workflows via the resource specification the Argo Server fails to resolve {{input.parameters.whatever}}. The outer workflow template does not have any inputs specified while the inner workflows (specified as resources) have inputs + arguments specified (via steps).

I've seen some similar problems before on Slack, but unfortunately I'm not sure there's a clean way of solving this.

For instance, you can't just exclude manifest from templating, because some people do want to use the outer Workflow to template the inner Workflow. So how do we tell which one you're trying to use? It requires some escaping, which then makes the manifest no longer clean YAML 😕

At my last job, I was able to workaround this with some indirection by chucking an Argo CD Application in the middle: use outer Workflow parameters to create the Application, then have the Application's Helm values parametrize the inner Workflow. (WorkflowTemplates rather, but same concept)

v3.3.3

Also surprised you're on such an old version, would be good to update 😅

flaviuvadan commented 5 months ago

It requires some escaping, which then makes the manifest no longer clean YAML 😕

What sort of escaping? Escaping the {{inputs.parameters.X}} fields, or?

Also surprised you're on such an old version, would be good to update

Yea, scheduled 😅

agilgur5 commented 5 months ago

Escaping the {{inputs.parameters.X}} fields, or?

Yes, Argo would have to, for instance, convert [[inputs.parameters.X]] to {{inputs.parameters.X}}. Or have some raw string syntax like Helm (perhaps more optimal / standardized but may itself conflict with Helm more). But that means your inner Workflow now has to have different, special syntax.

That's not a feature that exists, which is why I labeled this as a feature rather than a bug. It's possible, but hacky, and Argo templating already has a number of hacks to work within the confines of YAML (e.g. all expressions are strings, but some need to be converted to different types, etc)

flaviuvadan commented 5 months ago

That's not a feature that exists, which is why I labeled this as a feature rather than a bug

Got it, thanks for clarifying! And thanks for labeling the issue as an enhancement. I followed what you mentioned about template and here's what I did (with Hera):

from uuid import uuid4

from hera.workflows.argo import Workflow, Steps, script, Resource, WorkflowTemplate, models as m

@script(image="python:3.7-slim")
def foo(msg):
    print(msg)

with WorkflowTemplate(name=str(uuid4()), entrypoint="s") as w1_template:
    with Steps(name="s"):
        foo(name="foo1", arguments={"msg": "Hello, World!"})

# create w1 as a workflow template since it will have everything necessary "pre-registered", like arguments
w1_template.create()
# then make a workflow that references it
w1 = Workflow(name="foo1", workflow_template_ref=m.WorkflowTemplateRef(name="foo1"))

with WorkflowTemplate(name=str(uuid4()), entrypoint="s") as w2_template:
    with Steps(name="s"):
        foo(name="foo1", arguments={"msg": "Hello, World!"})

# similarly to w1, create w2 as a workflow template
w2_template.create()
# then make a workflow that references it
w2 = Workflow(name="foo2", workflow_template_ref=m.WorkflowTemplateRef(name="foo2"))

# finally, make a workflow of workflows that launches the other two
with Workflow(generate_name="wf-of-wf-", entrypoint="main") as w:
    w1r = Resource(name="w1r", manifest=w1, action="create")
    w2r = Resource(name="w2r", manifest=w2, action="create")

    with Steps(name="main"):
        w1r()
        w2r()

w.create()

While valid, it comes with the cost of implementing a way to clean up templates. The reason I mention this is because there are many use cases for "experimentation on Argo Workflows". Since parameters between workflow submission can change, so will the templates, so clients end up with a proliferation of templates unless they are deleted periodically and strategically (based on labels, annotations, name patterns, etc).

agilgur5 commented 5 months ago

I followed what you mentioned about template and here's what I did

Ah right, a plain WorkflowTemplate will add enough indirection (I had used an Argo CD Application since I had a whole chart which had a WorkflowTemplate in it)

While valid, it comes with the cost of implementing a way to clean up templates. The reason I mention this is because there are many use cases for "experimentation on Argo Workflows".

You mean like testing out Workflows in a dev environment? Since for Prod, WorkflowTemplates + GitOps is best practice anyway

@script(image="python:3.7-slim")

Also, speaking of old versions, Python 3.7 is EoL 😅

flaviuvadan commented 5 months ago

You mean like testing out Workflows in a dev environment?

No, I am referring to experimentation. Think prod environment that supports arbitrary async job execution. I know multiple orgs who use Argo Workflows as their "experimentation platform" so their workflows cannot be turned into templates because they change often as part of the experiment (parameters change, scripts change, etc). This is all from Hera's lens btw, so it might feel a bit strange given AW only supports YAML and "static" templating. However, for "workflows of workflows" users can turn their workflows into templates (like above) prior to execution, but then have to delete them probably because those templates will not be used again

Also, speaking of old versions, Python 3.7 is EoL

Hera defaults to python:3.8 😅 the above was my mistake for including an EoL image, sorry

agilgur5 commented 5 months ago

No, I am referring to experimentation. Think prod environment that supports arbitrary async job execution. I know multiple orgs who use Argo Workflows as their "experimentation platform"

Huh okay. My last job was on an ML Platform, but the experimentation side was nearly all Notebooks (JupyterHub or Kubeflow). There were some advanced use-cases on Kubeflow Pipelines, but they were in the minority and were more typically productionized training jobs (i.e. no longer experimental). Also, all of that environment had periodic pruning/clean up jobs already, even before Argo/KFP was introduced. I was the top eng for Serving on the other side, where everything was productionized, which did sometimes require Data Scientists to refactor their code in different ways.

This is all from Hera's lens btw, so it might feel a bit strange given AW only supports YAML and "static" templating.

Yea I've worked enough with KFP which has even more out-of-Argo code sharing features, so I'm familiar with having a bit of a impedance mismatch (esp with all the spec renames, metadata server, etc etc in KFP) 😅 I like that Hera is a lot closer to native Argo, but yea any form of meta-templating can create some different usage patterns, which is a tad confusing. Helm usage is still pretty "static" though, since you still see your raw YAML at the end and still submit through kubectl or argo CLI.

I was thinking you could further use Python's with for auto clean-up if a user wanted that. Like an outer with cleanup(): block or something.

From the Argo side, a small bit of indirection with a WorkflowTemplate is a pretty darn good workaround. I'd probably recommend raw string syntax (which I think requires a new feature in expr) as the solution for this, but within a manifest block it would still be a bit sub-optimal. For instance, in Helm I would separate the entire manifest as a separate YAML file for better readability and re-usability; you'd have to be keenly aware that it was an inner manifest to use raw strings. So with a simple workaround and sub-optimal, upstream change required to support, I'm not sure this would be prioritized soon

antonmedv commented 4 months ago

raw string syntax (which I think requires a new feature in expr)

Expr now supports raw strings: https://expr-lang.org/docs/language-definition#strings

agilgur5 commented 4 months ago

Expr now supports raw strings: https://expr-lang.org/docs/language-definition#strings

Thanks for the note! This does appear to work. I thought that would work in isolation, but apparently Argo needs to escape its own expression syntax {{ / }} as well. That too can be worked around simply as mentioned in that discussion with "{" + "{".

Concatenating two strings should be a sufficient workaround for this use-case too, and would avoid needing a WorkflowTemplate.

From the Argo side, a small bit of indirection with a WorkflowTemplate is a pretty darn good workaround.

There might be a third (or foruth) workaround here for this specific use-case too: make the entire manifest (or a chunk of it) into a parameter. Then {{inputs.parameters.msg}} is inside of another expression. Combined with the concatenation workaround above this should always work too.

flaviuvadan commented 3 months ago

Concatenating two strings should be a sufficient workaround for this use-case too, and would avoid needing a WorkflowTemplate.

Concretely, the current workaround is to use {{ and }} around the parameter specification, which can also have a {{ and }}? Something like {{ {{inputs.parameters.something}} }}?

agilgur5 commented 3 months ago

No I don't believe that will work (though you can try) due to lack of escaping.

I meant the same tactic as used in https://github.com/argoproj/argo-workflows/discussions/12805#discussioncomment-8818347 (which I linked above). In this case, that would be:

{{= "{" + "{inputs.parameters.message}" + "}" }}

Might need to do something similar with inputs.parameters.message as well (try the one above first though):

{{= "{" + "{input" + "s.parameters.message}" + "}" }}

That effectively forces no substitutions

antonmedv commented 3 months ago

What about adding a helper to wrap in brackets?

{{=  wrapInBrackets("inputs.parameters.message") }}
agilgur5 commented 3 months ago

Mmm I don't think a helper would be optimal -- Argo should probably not try to parse any of the text inside an expression; that seems pretty confusing that it does and I don't think that's intentional (I think it's just running a simple replace when it detects braces; I believe that predates expr usage, see also #9529 ).