Updates the transformation code to handle a case where a parent record (ie. opportunity or opportunity_summary) is deleted, AND the child records (everything else) is marked to be deleted as well.
Also added a new way to set metrics that handles adding more specific prefixed ones (eg. total_records_processed and opportunity.total_records_processed) - will expand more on this later.
Context for reviewers
Imagine a scenario an opportunity with a summary (synopsis) and a few applicant types gets deleted. The update process for loading from Oracle will mark all of our staging table records for those as is_deleted=True. When we go to process, we'll first process the opportunity, and delete it uneventfully, however we have cascade-deletes setup. This means that all of the children (the opportunity summary, and assistance listing tables among many others) also need to be deleted. SQLAlchemy handles this for us.
However, this means when we then start processing the synopsis record that was marked as deleted - we would error and say "I can't delete something that doesn't exist". To work around this, we're okay with these orphan deletes, and we just assume we already took care of it.
Additional information
To further test this, I loaded a subset of the prod data locally (~2500 opportunities, 35k records total). I then marked all of the data is is_deleted=True, transformed_at=null and ran it again. It went through the opportunities deleting them. When it got to the other tables, it didn't have to do very much as they all hit the new case. The metrics produced look like:
Summary
Fixes #1983
Time to review: 5 mins
Changes proposed
Updates the transformation code to handle a case where a parent record (ie. opportunity or opportunity_summary) is deleted, AND the child records (everything else) is marked to be deleted as well.
Also added a new way to set metrics that handles adding more specific prefixed ones (eg.
total_records_processed
andopportunity.total_records_processed
) - will expand more on this later.Context for reviewers
Imagine a scenario an opportunity with a summary (synopsis) and a few applicant types gets deleted. The update process for loading from Oracle will mark all of our staging table records for those as
is_deleted=True
. When we go to process, we'll first process the opportunity, and delete it uneventfully, however we have cascade-deletes setup. This means that all of the children (the opportunity summary, and assistance listing tables among many others) also need to be deleted. SQLAlchemy handles this for us.However, this means when we then start processing the synopsis record that was marked as deleted - we would error and say "I can't delete something that doesn't exist". To work around this, we're okay with these orphan deletes, and we just assume we already took care of it.
Additional information
To further test this, I loaded a subset of the prod data locally (~2500 opportunities, 35k records total). I then marked all of the data is
is_deleted=True, transformed_at=null
and ran it again. It went through the opportunities deleting them. When it got to the other tables, it didn't have to do very much as they all hit the new case. The metrics produced look like:And as a sanity check, running again processes nothing.