apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.75k stars 14.22k forks source link

Databricks Provider - Task within a Workflow to handle different "run if dependencies" configuration (currently only supports default ALL_SUCCEEDED) #42822

Open RafaelCartenet opened 1 week ago

RafaelCartenet commented 1 week ago

Description

Concerns airflow.providers.databricks.operators.databricks

When creating a task inside a Workflow in Databricks, you can choose "Run if dependencies", see screenshot below.

https://docs.databricks.com/en/jobs/run-if.html

image

The workflow json contains the information at the task level, for example:

{
      "task_key": "C",
      "depends_on": [
        {
          "task_key": "A"
        },
        {
          "task_key": "B"
        }
      ],
      "run_if": "ALL_SUCCEEDED",
     ...
}

It is not supported using Databricks provider for now, in the api call to create the workflow it's ignored and thus default value is used: "ALL_SUCCEEDED". Would be awesome to be able to feed that information at the task level so that we can handle more dependency types.

I think the best would be to be able to leverage Airflow operator generic trigger_rule but i'm not too sure how to implement that or if that's doable.

I think the easiest would be add a parameter in the DatabricksNotebookOperator that would override the run_if field in the job json object

I'm happy to help with a PR

Use case/motivation

I have this complex job in Databricks that I am trying to migrate as code and I'm blocked because I can't reproduce the dependency issue that I mentioned.

Related issues

No response

Are you willing to submit a PR?

Code of Conduct

boring-cyborg[bot] commented 1 week ago

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.