argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.08k stars 3.2k forks source link

support start an workflow from any step #10828

Open leveryd opened 1 year ago

leveryd commented 1 year ago

Summary

support entrypoint in an workflow.

Use Cases

When would you use this?

Hey there, I'm working with some Argo DAGs and I'm looking to be able to execute just one part of the DAG (for example in a testing or prototyping environment) based on a specific set of nodes. E.g. if I had a DAG that looked like

  A
 / \
B   C
     \
      D

and I wanted to just execute B, I wouldn't want C and D to also execute. My impression from the team I'm working with right now is that the way to do this is to comment out the portions of the template that pertain to C and D, but I'm hoping that there's a better way to do this, ideally by specifying that I want Argo to execute B (or some other set of nodes) and have it figure out what needs to be executed to do so. Obviously our DAGs in reality are much more complex than this, and I don't want to have to reason about which portions should be commented out for a given run in an ad-hoc fashion.


Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritise the proposals with the most 👍.

Joibel commented 1 year ago

This is already possible in various ways:

I'm sure there are more alternatives, I don't think this is an exhaustive list.

zach-jablons-hinge commented 1 year ago

(I wrote this up as a question on the CNCF slack, thanks @leveryd for posting it here)

Is it possible to do this for an arbitrarily complex DAG without needing to create bespoke parameters or templates for each possible sub-DAG someone might want to run?

The basic interface I'm hoping to build here is for users to specify (only):

And for Argo to resolve the rest. Is that possible now in some way, or would that have to be considered as an enhancement?

Joibel commented 1 year ago

A DAG implies that some arbitrary step depends on all of it's parent steps to have done something. So you're asking that if I choose that today I like B, you want A and B to get run automatically, but because B doesn't have dependencies on C or D they get ignored. Tomorrow I like C and so I get just A and C run? And you'd like it if argo has an interface for this and works it out on your behalf?

zach-jablons-hinge commented 1 year ago

'like' makes it sound like I'm making arbitrary decisions here, but yes, essentially - you could imagine that e.g. B is a hyperparameter tuning metric that I'm computing for the model generated by A (or the sub-DAG A represents), and today I'm making some changes to the model and want to get a quick understanding of its performance, but tomorrow I want to run the deeper set of analyses and splits on the model represented by nodes C and D.

Joibel commented 1 year ago

From workflow's perspective your choices have no basis that it can understand, so from its perspective they're pretty arbitrary :grinning:

terrytangyuan commented 1 year ago

'like' makes it sound like I'm making arbitrary decisions here, but yes, essentially - you could imagine that e.g. B is a hyperparameter tuning metric that I'm computing for the model generated by A (or the sub-DAG A represents), and today I'm making some changes to the model and want to get a quick understanding of its performance, but tomorrow I want to run the deeper set of analyses and splits on the model represented by nodes C and D.

You might want to check out Katib for that use case: https://github.com/kubeflow/katib

leveryd commented 1 year ago

Katib seem like to be only for ML area.

Imagine a steps workflow, A -> B -> C -> D -> E

If i want to run from A or B or C or D, i need to write at least four workflows, it is bad experience for me, because:

In my project, i need to make sure directory name has some meaning, for example, "level2" template will be called by "level3" template, and "level1" template will be called by "level2" template.

0xdarkman commented 11 months ago

Airflow can do this