HHS / simpler-grants-gov

https://simpler.grants.gov
Other
29 stars 9 forks source link

[Issue #1983] Adjust transformation deletes to handle cascading deletes #2000

Open chouinar opened 1 month ago

chouinar commented 1 month ago

Summary

Fixes #1983

Time to review: 5 mins

Changes proposed

Updates the transformation code to handle a case where a parent record (ie. opportunity or opportunity_summary) is deleted, AND the child records (everything else) is marked to be deleted as well.

Also added a new way to set metrics that handles adding more specific prefixed ones (eg. total_records_processed and opportunity.total_records_processed) - will expand more on this later.

Context for reviewers

Imagine a scenario an opportunity with a summary (synopsis) and a few applicant types gets deleted. The update process for loading from Oracle will mark all of our staging table records for those as is_deleted=True. When we go to process, we'll first process the opportunity, and delete it uneventfully, however we have cascade-deletes setup. This means that all of the children (the opportunity summary, and assistance listing tables among many others) also need to be deleted. SQLAlchemy handles this for us.

However, this means when we then start processing the synopsis record that was marked as deleted - we would error and say "I can't delete something that doesn't exist". To work around this, we're okay with these orphan deletes, and we just assume we already took care of it.

Additional information

To further test this, I loaded a subset of the prod data locally (~2500 opportunities, 35k records total). I then marked all of the data is is_deleted=True, transformed_at=null and ran it again. It went through the opportunities deleting them. When it got to the other tables, it didn't have to do very much as they all hit the new case. The metrics produced look like:

total_records_processed=37002
total_records_deleted=2453
total_delete_orphans_skipped=34549
total_error_count=0

opportunity.total_records_processed=2453
opportunity.total_records_deleted=2453

assistance_listing.total_records_processed=3814
assistance_listing.total_delete_orphans_skipped=3814

opportunity_summary.total_records_processed=3827
opportunity_summary.total_delete_orphans_skipped=3827

applicant_type.total_records_processed=17547
applicant_type.total_delete_orphans_skipped=17547

funding_category.total_records_processed=4947
funding_category.total_delete_orphans_skipped=4947

funding_instrument.total_records_processed=4414
funding_instrument.total_delete_orphans_skipped=4414

And as a sanity check, running again processes nothing.