apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
35.76k stars 13.91k forks source link

[databricks] Refactor how Databricks workflows repair / repair all is implemented #40587

Open tatiana opened 1 month ago

tatiana commented 1 month ago

Apache Airflow Provider(s)

databricks

Versions of Apache Airflow Providers

6.7.0

Apache Airflow version

2.9

Operating System

all

Explain the improvement

To expose repair and repair all tasks, the Databricks provider 6.7.0 relies on the soon-to-be-deprecated Airflow 2.x plugins. This was introduced in #40153. This is one of the most used features of the original https://github.com/astronomer/astro-provider-databricks, and we're completing the migration with this PR. I've interacted with at least five Astronomer customers who use this feature, and the project has received over 115k monthly downloads on PyPI.

To use plugins is suboptimal, but as of Airflow 2.9, the core Airflow doesn't offer a better way to implement this feature, as discussed in the thread: https://github.com/apache/airflow/pull/40153#discussion_r1663852033

As part of Airflow 3.x, we want to find a better way to allow providers to implement this feature type. @potiuk is going to help us log this into the 3 roadmap (he already added to https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+3+Workstreams#Airflow3Workstreams-Othercandidates.1), and Astronomer commits to migrating the repair and job links to the Airflow 3.x strategy. I also aligned with @cmarteepants on this topic and she's on agreement we'll reimplement this once there is an alternative approach.

Are you willing to submit PR?

Code of Conduct

potiuk commented 1 month ago

This is logged now in https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+3+Workstreams#Airflow3Workstreams-Othercandidates.1