Princeton-LSI-ResearchComputing / tracebase

Mouse Metabolite Tracing Data Repository for the Rabinowitz Lab
MIT License
4 stars 1 forks source link

Update loading code to populate `MSRunSequence` and `MSRunSample` directly #712

Closed lparsons closed 6 months ago

lparsons commented 1 year ago

FEATURE REQUEST

Inspiration

Migration to new model

Description

Update the loading code to populate MSRunSequence and MSRunSample. Typically, we will collect one set of data for each peak annotation file, but it would be ideal if when loading we could expand that to a table with:

Most of the time, we can use the submission input to generate a table, but this would give us the flexibility to manually generate/edit the table for more complex submissions where one peak annotation file has samples that use different LC methods, etc.

Remove all references to test tag broken_until_issue712.

Alternatives

Dependencies

This issue cannot be started until the completion of the following issue(s):

Comment

Branch: load_msrun_sample_sequence


ISSUE OWNER SECTION

Note, there is significant overlap between this issue and the already implemented issue #706, which implements the "table" described in the issue description. All that's needed is to populate the correct Models, remove the broken tags (broken_until_issue712), and:

Assumptions

None

Requirements

Limitations

Affected Components

A tentative list of anticipated repository items that will be changed, labeled with "add", "delete", or "change". One item per line. (Mostly, this will be a list of files.)

DESIGN

Interface Change description

No outward interface changes compared to what was already implemented in #774.

Code Change Description

The changes should be pretty simple, and similar to the type of changes implemented already in the DataRepo/migrations/0027_msrun_to_msrunsample_msrunsequence.py file in #804. It will do a get_or_create on the MSRunSequence and MSRunSample, except it will load the files as ArchiveFile records (if provided) and the instrument.

Tests

A test for each requirement

hepcat72 commented 8 months ago

merged

hepcat72 commented 8 months ago

@lparsons - I filled in a design and added the design:needs-review tag. I'm going to proceed with this issue (branched off branch migrate_msrun from PR #804) because it should be pretty straightforward, as most of this was already done in #774. If you see any design issues, please indicate the specific design item and highlight the specific issue and add the design:changes-requested tag and a comment.

lparsons commented 8 months ago

@hepcat72 Looks good to me. I would suggest that you create an issue to update the validation interface, since we know that will be needed. Might be a useful place to stash things you notice as you work on this.

hepcat72 commented 8 months ago
sepiss
hepcat72 commented 7 months ago

@lparsons - I added requirement 7. (and sub-items) based on the discussion about the polarity value in the meeting. I think I was true to what was decided in the meeting, but let me know if you see anything untoward.

lparsons commented 7 months ago

It still seems useful to have the command line option to supply polarity. We want options for the curators. We just want to simplify the requests of the researchers.

When loading data, the polarity can be determined as follows:

When asking the researchers, do not explicity ask for polarity. Curators may decide to supply polarity values even if we don't have an mzXML file, but we shouldn't start by asking.

So, my attempt at putting that into your format above:

  1. Polarity can be supplied in three ways: parsed from mzXML, read from LCMS metadata file, supplied on the command line a. If multiple values are supplied for a given MSRun, ensure they match, error if not b. If no value is supplied, the default should be unknown
  2. The study submission process should not ask for the polarity explicitly a. Remove polarity from the form b. Keep the column in the LCMS data file, but we won't explicitly ask for it, and it should be optional