A customer reported that, from time to time, instances of DatabricksNotebookOperator are stuck in a running state in Airflow while being completed on Databricks.
The logs need to explain what the Databricks job is trying to use - they are empty.
Since this affects an Astronomer customer and we have not completed the migration yet, my suggestion is that:
We give visibility of what is happening in the Airflow worker node by logging something like "Waiting for the job to complete, current status: PENDING"
We make the implementation of polling the status of the job consistent with what we have contributed to Airflow.
A customer reported that, from time to time, instances of
DatabricksNotebookOperator
are stuck in a running state in Airflow while being completed on Databricks.The logs need to explain what the Databricks job is trying to use - they are empty.
While checking our code, I noticed that the implementation could be improved. https://github.com/astronomer/astro-provider-databricks/blob/3e1ca039a024a98f9079d178478aa24702e15453/src/astro_databricks/operators/notebook.py#L235C1-L238C64
The implementation seems to have been improved in our contribution to Airflow https://github.com/apache/airflow/pull/39178
In: https://github.com/astronomer/airflow/blob/20dacc7cec64d0055fad79943fd6afa453dbe775/airflow/providers/databricks/operators/databricks.py#L1038-L1063
Since this affects an Astronomer customer and we have not completed the migration yet, my suggestion is that: