Princeton-LSI-ResearchComputing / tracebase

Mouse Metabolite Tracing Data Repository for the Rabinowitz Lab
MIT License
4 stars 1 forks source link

Placeholder data migration rewrite #965

Closed hepcat72 closed 4 months ago

hepcat72 commented 4 months ago

Summary Change Description

This is a rewrite of #953 to fix more issues that exist than it actually fixed. #953 was written to only handle current unique constraint violations. Multiple placeholder records were still allowed to persist in the database with fake ArchiveFile records when they didn't really all need to persist. Based on what I learned during the implementation of the MSRunsLoader, there exist numerous placeholder records that cause no conflicts in the PeakGroup table because if you boiled them down to 1 placeholder, they had no duplicate compounds amongst the merged PeakGroups. So this rewrite identifies the precise conflicting peak groups, and when there are no conflicts, it just straight-up merges them. When there are conflicts, not only does it create the fake ArchiveFile records for the mzXML files, it also includes the duplicate peak group names in the fake record that the researcher must resolve.

The migration contains a print that explains what resulted from the migration. I ran it on a complete copy of production data in my sandbox and these are the stats it reported (note, I changed the prints slightly after I ran the migration, so when we run this on the actual database, it will be a little different):

68 MSRunSample records merged down to 34 records.
60 MSRunSample records given fake mzXML files to resolve unique constraint because they have multiple peak groups with the same compound.

Affected Issues/Pull Requests

Review Notes

See comments in-line.

Checklist

This pull request will be merged once the following requirements are met. The author and/or reviewers should uncheck any unmet requirements:

hepcat72 commented 4 months ago

Can we get a report of the problematic records entered into an issue at https://github.com/PrincetonUniversity/tracebase-rabinowitz-data? Those issues can be sorted out while we move forward with the loader updates.

I did previously generate that list via the shell and shared over slack. It will be exactly the same as this list here. Let me see if I can find it.

hepcat72 commented 4 months ago

OK, I created an issue: https://github.com/PrincetonUniversity/tracebase-rabinowitz-data/issues/112

hepcat72 commented 4 months ago

I'll hold off on merging this until the branch/PR it merges into (#962) is reviewed (I know it's a big one - sorry about that).

hepcat72 commented 4 months ago

Rebased.