HHS / simpler-grants-gov

https://simpler.grants.gov
Other
29 stars 9 forks source link

[Issue #1977] Adjust transformation logic to handle orphaned history records #1982

Closed chouinar closed 1 month ago

chouinar commented 1 month ago

Summary

Fixes #1977

Time to review: 10 mins

Changes proposed

Handle orphaned history records when the opportunity (and consequently) opportunity summary objects do not exist

Add a transformation_notes column for us to put whatever info regarding the transformation into - in this case marking that we skipped a record for a particular reason.

Context for reviewers

A record in the tsynopsis_hist or tforecast_hist table connects with an opportunity, however that opportunity may not be in the topportunity table as it was deleted, and its record only remains in the topportunity_hist table - a table we aren't yet importing. Because of this, we can't actually import these records. While we may want to import the historical opportunities, we're not going to do that at this time, instead we'll mark these records as processed

For the three sets of one-to-many lookup tables, we also need to add a check because if we couldn't import the historical synopsis/forecast into our opportunity_summary table - it'll also cause those to fail.

Important detail - historical data isn't technically required right now, and as far as I can tell this should only apply to deleted opportunities (something that wouldn't ever be visible in our next few features)

Additional information

Testing this with a snapshot of prod data, I was able to run through the full dataset locally and get the following metrics

total_records_processed=1484400
total_records_deleted=0 
total_records_inserted=1338467
total_records_updated=0
total_records_orphaned=2305
total_duplicate_records_skipped=977
total_historical_orphans_skipped=142650 
total_error_count=1
task_duration_sec=615.295
app.name=src.app

Note that the error is unrelated to this work and will be addressed separately (boolean with an invalid value)