edanalytics / edu_edfi_airflow

Manages extract-load of Ed-Fi data in Airflow
Other
4 stars 0 forks source link

Feature/earthbeam dynamic task mapping refactor taskflow #54

Closed jayckaiser closed 2 months ago

jayckaiser commented 2 months ago

Feature: Dynamic Task Mapping for EarthbeamDAG (using Airflow Taskflow API)

Description & motivation

This PR refactors the EarthbeamDAG to include a new build_dynamic_tenant_year_taskgroup() method that lists files in raw_dir (or the python preprocess callable) and dynamically maps Earthbeam task groups across them. The same taskflow API approach is used for dynamic and default taskgroups. Additionally, this method standardizes input file argument passing into Earthmover.

This PR is a draft for two reasons:

PR Merge Priority:

I need to prioritize finishing this as soon as possible. It may be worth merging into the RC branch early and cleaning up rough edges afterwards.

Changes to existing files:

New files created:

Tests and QC done:

This has been successfully tested in SC dev and Texas (although more analysis of outputs and logs is needed to verify all pathing is working as expected).

jayckaiser commented 2 months ago

I believe that our bug where dynamically-mapped sideloading-to-Stadium task logs are inaccessible in Grid view, but not in Graph view is related to Issue 34535, and resolved in PR 34587. This may have been resolved in Airflow 2.7.2.