Overview
Following a pipeline error, we have missing log and resource csv’s for certain dates which is impacting the Data Management team’s reporting. We should create a mechanism whereby the log and resource csv’s can be created from the raw data to backfill these dates.
This is functionality that will be need to be used for this particular use-case, but is also useful to have going forward for helping resolve other pipeline issues.
Pull Request(PR):
Tech Approach
Add an argument to rebuild the full logs
S3 sync logs from the bucket
Recreate log.csv and resource.csv from the complete logs
Can we use something like duckdb?
Acceptance Criteria/Tests
log.csv and resource.csv are regenerated and contain all the data
Overview Following a pipeline error, we have missing log and resource csv’s for certain dates which is impacting the Data Management team’s reporting. We should create a mechanism whereby the log and resource csv’s can be created from the raw data to backfill these dates.
This is functionality that will be need to be used for this particular use-case, but is also useful to have going forward for helping resolve other pipeline issues.
Pull Request(PR):
Tech Approach
Acceptance Criteria/Tests
Resourcing & Dependencies