astronomer / astro-provider-databricks

Orchestrate your Databricks notebooks in Airflow and execute them as Databricks Workflows
Apache License 2.0
20 stars 10 forks source link

Duplicate Job Creation in Databricks During Airflow DAG Runs #75

Closed Hang1225 closed 2 months ago

Hang1225 commented 3 months ago

Issue

Our teams at HealthPartners are encountering a recurring issue where each execution of an Airflow DAG leads to the creation of a new job, despite the job already existing within the Databricks workspace.

This issue is most likely linked to the Databricks REST API retrieving a limit of 20 jobs per request, by default. In instances where the workspace contains over 20 jobs, additional API requests are necessary utilizing the 'next_page_token' from the initial call to fetch the complete job list.

Proposed Solution

Under "_get_job_by_name" function in operators/workflow.py: