argoproj / argo-workflows

Workflow Engine for Kubernetes
https://argo-workflows.readthedocs.io/
Apache License 2.0
14.91k stars 3.18k forks source link

Split DAG Engine into own package (potentially own repo) #13502

Open agilgur5 opened 1 month ago

agilgur5 commented 1 month ago

Summary

Related to #12694 , the raw "business logic" of DAG processing could be separated from the Controller's operator functionality (i.e. watching and maintaining Pods).

These can be split out:

  1. YAML -> DAG struct
  2. initial DAG struct -> completed/processed DAG struct

Use Cases

This would help simplify and isolate logic for separation of concerns. In particular, testing DAG states can be done without E2Es and with plain units (input -> output), allowing for a more robust and resilient core. Then the Controller/Operator can focus on k8s specific functionality. For instance, #13501 's test could be mocked very simply in a DAG Engine, and even more complex states with parameters and artifacts could be mocked as well -- entirely struct based. E2Es would then only be necessary when a container must actually run to verify the logic (e.g. creating/deleting artifacts, executor logic, operator logic)

The robust core makes me think of Temporal / Cadence, that simplified the core to state machine processing, and then the rest is built around it. From a different perspective, can think of state machine processing as the core back-end, with all its APIs as the front-end -- in Argo's case, the Workflow spec is the front-end and the core DAG processing engine would be the back-end. This proposal is to more cleanly split those two pieces for better maintainability and potentially to open up new use cases.

A separate package and repo could also be potentially useful if someone wanted to run Argo outside of k8s. Or just use the DAG engine outside of k8s. For instance, for local dev with plain containers (not Pods), like Dagger. Or for local dev with plain commands, like Kit.

This could also potentially simplify transformations from and to other Workflow specifications, e.g. CWL #873, WDL, iWF (this blog post also notes a small core translation layer), SWF, Tekton, etc. Minimally, it could make those transformations easier to test.

Vaguely similar to CD's gitops-engine, but in this case, isolating out the non-k8s-specific part entirely. In a similar sense, this could be something that Tekton, Argo, and other "competing" workflow engines could collaborate on too.


Message from the maintainers:

Love this feature request? Give it a 👍. We prioritise the proposals with the most 👍.

terrytangyuan commented 1 month ago

Vaguely similar to CD's gitops-engine, but in this case, isolating out the non-k8s-specific part entirely. In a similar sense, this could be something that Tekton, Argo, and other "competing" workflow engines could collaborate on too.

I am not sure if other workflow engines would contribute to this. GitOps Engine is probably only used by ArgoCD (I remember Flux and GitLab stepped away from it at some point but correct me if I am wrong).

agilgur5 commented 1 month ago

I know CD folks have been looking to re-merge gitops-engine, but not sure about contributions.

In our case, I think it is worthwhile even without that, as the primary benefits are for maintenance purposes. That's also why I suggested a separate package still within this repo so as not to have a dep to constantly upgrade. But if it's stable enough, it might make sense to split out too.