I recommend looking at the log comments, at least for the first 2 commits. There's a lot there, so I'll give a high level overview here:
This PR basically implements a script to load either:
mzXML files by themselves (along with 4 command line options to specify what sequence they all belong to)
[In addition to an optional set of mzXML files...] an excel (containing a (perhaps poorly named) "Peak Annotation Details" sheet) or tsv file that minimally defines:
what sample each mzXML goes with.
What sequence each mzXML came from, which allows you to load all mzXML files associated with any number of sequences. (You specify the sequence as a comma-delimited string that will be populated using a dropdown that is generated from the Sequences sheet from PR #948.)
When you also know a peak annotation file that the mzXML was used in, that can be specified too.
This code can differentiate between multiple mzXML files with the same name. It detects when there are multiple to choose from, and if it doesn't know which one goes with which peakAnnotation file, it prompts the user (with an error) to add the mzXML's full path to the entries in the infile. Those errors will eventually be attached to the actual excel cells.
It also re-links peak groups previously linked to placeholder records to newly created non-placeholder records (i.e. those with mzXML files) using metadata parsed from the mzXML and comparing that data to the med_mz values in the PeakData. Since there is no data in the peak annotation files that indicate polarity, and since scan ranges of different mzXMLs for the same sample can overlap, this isn't perfect, but since we don't expect users to mix multiple mzXMLs from different polarities or scan ranges for the same sample in a single peak annotation file, I don't think this is a serious issue. And even if some peak groups in rare cases link to the wrong mzXML MSRunSample record, users are unlikely to ever want that single association.
Affected Issues/Pull Requests
Partially addresses #825
Merges into PR #948
Review Notes
See comments in-line.
Checklist
This pull request will be merged once the following requirements are met. The
author and/or reviewers should uncheck any unmet requirements:
Review requirements
Minimum approvals: 1
No changes requested
All blocking issues resolved by reviewers
Specific reviewers: @add_username_here
Review period: 2 days
Associated issue/pull request requirements:
[x] All requirements in affected issues marked "resolved" are satisfied
[x] All required pull requests are merged (or none)
Summary Change Description
I recommend looking at the log comments, at least for the first 2 commits. There's a lot there, so I'll give a high level overview here:
This PR basically implements a script to load either:
This code can differentiate between multiple mzXML files with the same name. It detects when there are multiple to choose from, and if it doesn't know which one goes with which peakAnnotation file, it prompts the user (with an error) to add the mzXML's full path to the entries in the infile. Those errors will eventually be attached to the actual excel cells.
It also re-links peak groups previously linked to placeholder records to newly created non-placeholder records (i.e. those with mzXML files) using metadata parsed from the mzXML and comparing that data to the med_mz values in the PeakData. Since there is no data in the peak annotation files that indicate polarity, and since scan ranges of different mzXMLs for the same sample can overlap, this isn't perfect, but since we don't expect users to mix multiple mzXMLs from different polarities or scan ranges for the same sample in a single peak annotation file, I don't think this is a serious issue. And even if some peak groups in rare cases link to the wrong mzXML MSRunSample record, users are unlikely to ever want that single association.
Affected Issues/Pull Requests
Review Notes
See comments in-line.
Checklist
This pull request will be merged once the following requirements are met. The author and/or reviewers should uncheck any unmet requirements:
changelog.md
(or no change)