astronomer / astro-provider-databricks

Orchestrate your Databricks notebooks in Airflow and execute them as Databricks Workflows
Apache License 2.0
20 stars 10 forks source link

Fix copying dependencies from task groups to notebooks (with inner task groups) #47

Closed tatiana closed 1 year ago

tatiana commented 1 year ago

Before this change, DatabricksWorkflowTaskGroup (astro-provider-databricks==0.1.3) did not pass Python dependencies to inner Databricks tasks if they were inside intermediate Airflow Task Groups for Airflow 2.2.4. For newer Airflow versions, it duplicated the dependencies proportionally to the amount of nested TaskGroups.

In the example, DAG introduced:

Screenshot 2023-06-14 at 16 52 59

We confirmed the Databricks cluster does not have simplejson as a dependency:

Screenshot 2023-06-14 at 16 45 10

And we also confirmed the notebooks failed since the cluster does not have this dependency:

Screenshot 2023-06-14 at 16 48 07

All the tasks from this Databricks Workflow have the simplejson dependency in Databricks, both the ones inside the inner task group and the ones outside. Examples:

Screenshot 2023-06-14 at 16 49 41 Screenshot 2023-06-14 at 16 49 55
codecov-commenter commented 1 year ago

Codecov Report

Patch coverage: 60.86% and project coverage change: -0.15 :warning:

Comparison is base (d3a633c) 78.47% compared to head (9fdeaa4) 78.33%.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #47 +/- ## ========================================== - Coverage 78.47% 78.33% -0.15% ========================================== Files 5 5 Lines 460 480 +20 Branches 57 64 +7 ========================================== + Hits 361 376 +15 - Misses 78 83 +5 Partials 21 21 ``` | [Impacted Files](https://app.codecov.io/gh/astronomer/astro-provider-databricks/pull/47?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=astronomer) | Coverage Δ | | |---|---|---| | [src/astro\_databricks/operators/notebook.py](https://app.codecov.io/gh/astronomer/astro-provider-databricks/pull/47?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=astronomer#diff-c3JjL2FzdHJvX2RhdGFicmlja3Mvb3BlcmF0b3JzL25vdGVib29rLnB5) | `83.82% <60.86%> (-1.53%)` | :arrow_down: |

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.