Closed jlaneve closed 1 year ago
Exactly this, thanks @jlaneve
Hi @jlaneve @RafaelCartenet
I think this functionality already exists within the current feature set :).
Ultimately, you seem to want to have the abstraction of Airlfow task groups and the "single job run" of a DatabricksWorkflowTaskGroup. This should be doable by placing your Airflow task groups inside of a DatabricksWorkflowTaskGroup.
So if you have
dg = DatabricksWorkflowTaskGroup()
with dg:
tg1 = TaskGroup()
with tg:
...
That should accomplish what you want, where there is a single launch for multiple task groups.
The downside of this is that your entire DAG will be inside of a single task group. One solution could be to create a DatabricksWorkflowDAG that assumes the entire DAG is a single databricks workflow (cc: @jlaneve @tatiana for whether we want to prioritize that or not), but this should at least unblock the functionality (even if it is a bit clunky in the UI).
@dimberman Very true! Thanks a lot! Will try it out! Have a good one
Consider the case where you have 4 workflows you want to run in a row, each with a few notebooks. Right now, you could use this package to define each workflow separately, which would look like this:
However, the downside of this is there are 4 separate
launch
tasks that launch clusters, even if the clusters across workflows are the same. It'd be neat to do something more like this:This would clutter the Databricks Workflow UI quite a bit, but a user shouldn't need to use that much. There would be time and cost savings to doing this.
Couple thoughts/notes: