astronomer / astro-provider-databricks

Orchestrate your Databricks notebooks in Airflow and execute them as Databricks Workflows
Apache License 2.0
21 stars 11 forks source link

Add support for TaskGroups from within the Databricks Workflow #27

Closed jlaneve closed 1 year ago

jlaneve commented 1 year ago

Consider the case where you have 4 workflows you want to run in a row, each with a few notebooks. Right now, you could use this package to define each workflow separately, which would look like this:

Screen Shot 2023-03-30 at 9 49 59 PM

However, the downside of this is there are 4 separate launch tasks that launch clusters, even if the clusters across workflows are the same. It'd be neat to do something more like this:

Screen Shot 2023-03-30 at 9 54 18 PM

This would clutter the Databricks Workflow UI quite a bit, but a user shouldn't need to use that much. There would be time and cost savings to doing this.

Couple thoughts/notes:

RafaelCartenet commented 1 year ago

Exactly this, thanks @jlaneve

dimberman commented 1 year ago

Hi @jlaneve @RafaelCartenet

I think this functionality already exists within the current feature set :).

Ultimately, you seem to want to have the abstraction of Airlfow task groups and the "single job run" of a DatabricksWorkflowTaskGroup. This should be doable by placing your Airflow task groups inside of a DatabricksWorkflowTaskGroup.

So if you have

dg = DatabricksWorkflowTaskGroup()
with dg:
    tg1 = TaskGroup()
    with tg:
        ...

That should accomplish what you want, where there is a single launch for multiple task groups.

The downside of this is that your entire DAG will be inside of a single task group. One solution could be to create a DatabricksWorkflowDAG that assumes the entire DAG is a single databricks workflow (cc: @jlaneve @tatiana for whether we want to prioritize that or not), but this should at least unblock the functionality (even if it is a bit clunky in the UI).

RafaelCartenet commented 1 year ago

@dimberman Very true! Thanks a lot! Will try it out! Have a good one