astronomer / astro-provider-databricks

Orchestrate your Databricks notebooks in Airflow and execute them as Databricks Workflows
Apache License 2.0
20 stars 10 forks source link

Prevent creation of duplicate jobs in Databricks #76

Closed Hang1225 closed 2 months ago

Hang1225 commented 3 months ago

Previously, Airflow DAG executions could inadvertently create duplicate jobs in the Databricks workspace, even when the job already existed. The root cause of this issue is that we checked if a job exists by querying the Databricks REST API using the list_jobs() method in workflow.py/_get_job_by_name. However, the REST API returns a limited set of jobs as a result of the paginated API, leading to incomplete results. Consequently, if the job name was not found in the first page of results retrieved by the list_jobs API, a duplicate job could be created.

To address this issue, this PR leverages the built-in job name filtering feature of the Databricks REST API within the list_jobs() method. This ensures that the API returns jobs with the given name, effectively preventing the creation of duplicate jobs in the Databricks workspace.

closes: https://github.com/astronomer/astro-provider-databricks/issues/75

Hang1225 commented 2 months ago

Hi @pankajkoti, thanks for your review. I've thoroughly tested this change in our development environment, testing 5 different jobs both before and after implementing the updates. This included testing 2 existing jobs that previously couldn't return a job id via the original REST API call. I can confirm that the changes successfully resolve the issue. We're ready to proceed with the merge. Thanks again for your support.

pankajkoti commented 2 months ago

Thanks @Hang1225 for the contribution. The PR has been merged and will be included in the next release.

I modified the PR title and description a bit to elaborate it further 🙌🏽

Hang1225 commented 2 months ago

Thank you again @pankajkoti! Would you be able to share the timeline for the next release? I'd like to update our teams on when to expect the fix.

pankajkoti commented 2 months ago

Hi @Hang1225 thanks, I will work on the release soon. Expected ETA before EOD tomorrow 17th April, 2024 IST

pankajkoti commented 2 months ago

Hi @Hang1225 , we just released https://pypi.org/project/astro-provider-databricks/0.2.2/ which includes this PR. Please try it out and let us know how it works. Thanks again for contributing this fix!