astronomer / astro-provider-databricks

Orchestrate your Databricks notebooks in Airflow and execute them as Databricks Workflows
Apache License 2.0
20 stars 10 forks source link

Improve `DatabricksNotebookOperator` monitoring job behaviour #80

Open tatiana opened 1 month ago

tatiana commented 1 month ago

A customer reported that, from time to time, instances of DatabricksNotebookOperator are stuck in a running state in Airflow while being completed on Databricks.

The logs need to explain what the Databricks job is trying to use - they are empty.

While checking our code, I noticed that the implementation could be improved. https://github.com/astronomer/astro-provider-databricks/blob/3e1ca039a024a98f9079d178478aa24702e15453/src/astro_databricks/operators/notebook.py#L235C1-L238C64

The implementation seems to have been improved in our contribution to Airflow https://github.com/apache/airflow/pull/39178

In: https://github.com/astronomer/airflow/blob/20dacc7cec64d0055fad79943fd6afa453dbe775/airflow/providers/databricks/operators/databricks.py#L1038-L1063

Since this affects an Astronomer customer and we have not completed the migration yet, my suggestion is that: