Princeton-LSI-ResearchComputing / tracebase

Mouse Metabolite Tracing Data Repository for the Rabinowitz Lab
MIT License
4 stars 1 forks source link

MSRunsLoader #962

Closed hepcat72 closed 1 month ago

hepcat72 commented 1 month ago

Summary Change Description

I recommend looking at the log comments, at least for the first 2 commits. There's a lot there, so I'll give a high level overview here:

This PR basically implements a script to load either:

This code can differentiate between multiple mzXML files with the same name. It detects when there are multiple to choose from, and if it doesn't know which one goes with which peakAnnotation file, it prompts the user (with an error) to add the mzXML's full path to the entries in the infile. Those errors will eventually be attached to the actual excel cells.

It also re-links peak groups previously linked to placeholder records to newly created non-placeholder records (i.e. those with mzXML files) using metadata parsed from the mzXML and comparing that data to the med_mz values in the PeakData. Since there is no data in the peak annotation files that indicate polarity, and since scan ranges of different mzXMLs for the same sample can overlap, this isn't perfect, but since we don't expect users to mix multiple mzXMLs from different polarities or scan ranges for the same sample in a single peak annotation file, I don't think this is a serious issue. And even if some peak groups in rare cases link to the wrong mzXML MSRunSample record, users are unlikely to ever want that single association.

Affected Issues/Pull Requests

Review Notes

See comments in-line.

Checklist

This pull request will be merged once the following requirements are met. The author and/or reviewers should uncheck any unmet requirements:

hepcat72 commented 1 month ago

Based on the review, I added a RollbackException class for use in all of the new loaders and I added a missing arg in a doc string.