argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
15.08k stars 3.2k forks source link

RFC/Proposal: DAG Transforms for extensibility, composability, and maintainability #12694

Open agilgur5 opened 8 months ago

agilgur5 commented 8 months ago

Summary

Allow for pluggable DAG transforms to implement various features. Closely related to #6943, but in a way that could potentally substantially reduce maintenance of the core by reducing its feature set by converting some existing features to transforms. It would also make Workflows much more composable, allowing for some of those features to be decoupled and have independent versions, meaning their release cycle would not be dependent on the core.

Use Cases

For example:

  1. steps could be rewritten as a DAG transform -- steps would just be a different syntax that converts into DAGs
  2. hooks could be rewritten as a DAG transform -- hooks syntax could also convert into regular DAGs
  3. Loops could be a DAG transform
  4. Executor Plugins could be rewritten as DAG transforms -- container tasks with custom images are very similar
  5. Artifact Plugins #5862 and potentially all of artifacts in general could be rewritten as DAG transforms -- the init and wait containers could be replaced with custom images similar to Executor Plugins above
  6. http, data, resource, inline etc templates could all be DAG transforms
  7. Probably more

Implementation Details

Pure compute transforms - can be implemented with secure WASM sandbox

As many DAG transforms could be pure computation with no requirement of network access or I/O, WASM could be a great way to implement these types of "plugins" (as mentioned in https://github.com/argoproj/argo-workflows/issues/6943#issuecomment-1901951588) as WASM is a secure sandbox and supports many languages. By no network access or I/O, I mean that a DAG transform "plugin" could just take the existing DAG as input and output a new DAG. That new DAG could use custom images etc (that the user would have to trust), but the transform itself could be pure compute.

Ultimately, WASM is more an implementation detail, but as a secure sandbox, it allows us to run such code on the Controller directly. Similar to how Envoy and many other tools these days allow WASM extensions. A custom image could do something similar, but in a less secure fashion, with a lifecycle requirement (like current executor plugins), and some deployment complexity.

Registering transforms

Transform WASM binaries could be mounted to the Controller (some CD plugins work similarly as files on the Server or Controller). The Controller would look in a certain directory and (attempt to) register all those binaries as transforms.

Alternatively, Istio's EnvoyFilter extension mechanism could be used as inspiration for how to register WASM binaries. It's somewhat similar to the implementation of the existing WASM Executor plugin.

When registered, a Workflow can use transforms of the respective names in their spec with arbitrary data, which will be passed as parameters/arguments to the registered transform. Transforms could also be composed on top of each other by ordering them appropriately (similar to say, Rollup or Webpack or Babel plugins in the JS ecosystem).

Backward Compatibility

This is potentially a big enough change that could warrant a v4. To reduce tech debt and the maintenance burden, I think that should be done eventually.

In the interim, transforms could be optional and we could start changing the docs to use them as a "soft deprecation" of built-in features. Some transforms, like steps for instance, could be baked in by default, but could be swapped and have an independent version. Then features replaced by transforms could be "hard deprecated" with warning messages when used and eventually removed entirely.

This form of "soft deprecation" followed by "hard deprecation" followed by eventual removal is somewhat common in tools with high stability (e.g. React, which is incredibly backward-compatible) and is also an approach I'm looking to implement to remove govaluate per https://github.com/argoproj/argo-workflows/issues/7576#issuecomment-1728725234


Message from the maintainers:

Love this enhancement proposal? Give it a 👍. We prioritize the proposals with the most 👍.

Joibel commented 8 months ago

@isubasinghe and I had discussed doing some of this in order to tidy up the code and remove some of the spaghetti, so I can totally support this.

Everything is a subset of a DAG, so the idea we were discussing was that nothing beyond the initial 'transform to dag' would need to know about any other forms of node representation, which should help with separation of concerns.

What I'm not getting from your proposal is whether you're intending to do a full transform on every reconcile. As each transform is a black box it feels there are two approaches.

agilgur5 commented 8 months ago

Everything is a subset of a DAG, so the idea we were discussing was that nothing beyond the initial 'transform to dag' would need to know about any other forms of node representation

Yea that's exactly what I was thinking 🙂 Glad to hear we're all on the same page!

The plugins part is me taking that up a notch in order to allow more user land customization, allowing for lots of workarounds and enabling lots of features, reducing the spec to make it more maintainable, etc.

  • Transform once and store.

I would definitely prefer to do this as it makes things more static and well-defined (which translates to less user confusion, better validation, more predictability, easier maintenance, etc). Alex, Crenshaw, and others did want to make things more predictable in the past, especially with templating (e.g. #9529), so that might align with their thought process, anticipations, ideas, and vision as well.

The dag language needs to be richer in order to support loops for example

I think what you're getting at with loops is the possible run-time dynamism with parameters (and templating)? Since with a static loop this shouldn't be necessary. But a parameterized, dynamic loop cannot be generated beforehand as you don't necessarily know the number of iterations.

There might be a way to workaround that, need to think about it. I had been looking at other DAG engines as well, but this is usually an implementation detail where everything is part of the spec anyway.

  • [...] but then transform happens on every reconcile.

The impure DAG certainly sounds less complicated than this option. It does have more maintenance overhead with regard to the spec and potential flexibility for users, but perhaps substantially less maintenance in terms of the logic itself and tracking down bugs therein compared to on every reconcile.

With custom images, you can handle some of the run-time dynamism, but not quite all of the spec complexity without making the image itself handle the spec (which would not be a desired limitation).

This is a good topic to think about. I'd probably default to the first option as less risky and less user complexity unless there is a very compelling reason / trade-off / limitation not to.