Closed candu closed 2 years ago
Assigning @peter-lyons to this - as mentioned, it's a good task for getting the bdit_move_etl
repo set up, for seeing what data pipelines we have right now, and for diving into the replicator
part of MOVE.
@peter-lyons did some work to update the DAG scheduling and dependencies, and introduce DAG task groups for Airflow 2.x (and make these backwards compatible, IIRC!). These changes are pending release, but closing out this issue since it's been migrated to our Notion (internal link only).
Description In this Notion page (internal-only), we document a recent review of our Airflow pipelines. As part of that, we discovered that some DAGs are running out of order, and that our DAG scheduling could be improved to reduce data staleness.
This task updates our schedules accordingly. In the process, you'll learn a bit about
replicator
andetl
, as well as about all the data pipelines that we use to manage data in MOVE.Acceptance Criteria
replicator
schedules forreplicator-local-CRASH
,replicator-local-FLOW
as indicated:replicator
machine over remote desktop and open VS Code;replicator-unregister-jobs.ps1
;replicator-register-jobs.ps1
;replicator-register-jobs.ps1
;etl
:Additional Notes See Job Dependencies and Scheduling (internal-only) for more details.
Note that
replicator-local-CRASH
andreplicator_transfer_crash
are two different things! The former runs inreplicator
, the latter runs onetl
under Airflow. Make sure you're updating the right one!Note also that
replicator
jobs use Windows Task Scheduler / PowerShell syntax for scheduling. Seereplicator-register-jobs.ps1
for how that works.Note that as you turn DAGs on, the new schedule should immediately trigger a run. As such, turn them on one at a time, in order as listed on that Notion page, and wait for each to complete before continuing. (This may take a while; it's good to have small tasks that you can complete while waiting!)
Finally: note that this issue only covers the AWS dev
etl
upgrade. You'll also have to deploy those changes to QA, as well as to prod with the next release. (You can, however, mark this closed once thedev
upgrade is complete.)