Closed lparsons closed 1 year ago
I assume "table" means an excel spreadsheet? Does the previous issue (#705) assume this? I worked some on 705 today and I did not infer this. I suspect that another sheet in the existing Excel template would be ideal.
Update the loading code to populate
LCMethod
. Typically, we will collect one set of data for each peak annotation file, but it would be ideal if when loading we could expand that to a table with:
- AccuCor file
- Sample (can do prefix matching)
- mzXML filename
- Researcher
- Date
- Instrument
- lc method
Most of the time, we can use the submission input to generate a table, but this would give us the flexibility to manually generate/edit the table for more complex submissions where one peak annotation file has samples that use different LC methods, etc.
I think this makes sense @hepcat72, but could you flesh out the proposal by mocking up the proposed new option/options to the load_accucor_msruns
command as well as the columns in the proposed new file? I think that would help clarify this idea for me, since it's still a bit vague atm.
Sure.
OK @lparsons, I added examples in the Interface Change description.
Thanks, that helps a lot. Here are a few questions to consider:
ms-protocol-name
refer to? I don't think there is any place this will be stored in the database.lc-protocol-name
, instrument
, and mzxml-files
parameters optional? I'm guessing those would be used when when all of the samples share the same value, correct?lcms-file
optional?xlsx
file for lcms-file
or a tsv
file, or both?Thanks, that helps a lot. Here are a few questions to consider:
- What the
ms-protocol-name
refer to? I don't think there is any place this will be stored in the database.
--ms-protocol-name
is a rename of --protocol
, and as you may recall, at the time that I had started this design, I was confused about the exclusion of the MS mode (e.g. negative/positive ion mode). I did not update this design after we had the opportunity to discuss it on slack. And the result of that discussion was that we would wait and see what the search usage would be. [Incidentally, I remain unconvinced that the saved effort it would take to retain that data is worth the loss of its searchablity, but be that as it may, I am aware that this option is on the outs. I just haven't done it.]
- Are the
lc-protocol-name
,instrument
, andmzxml-files
parameters optional? I'm guessing those would be used when when all of the samples share the same value, correct?
All of the options (which I will henceforth refer to as "defaults") that correspond to the columns in the LCMS metadata file (including lc-protocol-name
, instrument
, and mzxml-files
) are conditionally required (/optional). Either the user provides an LCMS metadata file or they set those options. One or both are required. The defaults will be required if the LCMS metadata file only has a subset of samples (/sample data headers). If the LCMS metadata file has every sample in it, the "defaults" are not required.
I wanted the LCMS metadata file to only be required in order to map multiple different sample data headers to a single sample record in tracebase. You only need to put in it, headers whose names differ from the sample names. If all headers are the same as in the sample table file, the LCMS metadata file can be omitted and everything would work like it already does.
- That would make the
lcms-file
optional?
Yes. See my explanation above. The lcms file is conditionally required with the "default" options.
- Did you intent to require a
xlsx
file forlcms-file
or atsv
file, or both?
It can be xlsx
or csv
, same as the sample/accucor files.
OK, that sounds great, thanks for the clarification.
TODO:
test_get_lcms_metadata_dict_from_file
test_check_peak_annotation_files
initialize_sample_names
get_missing_required_lcms_defaults
lcms_defaults
as a separate member variablevalidate_mzxmls
FEATURE REQUEST
Inspiration
Migration to new model
Description
Update the loading code to populate
LCMethod
. Typically, we will collect one set of data for each peak annotation file, but it would be ideal if when loading we could expand that to a table with:Most of the time, we can use the submission input to generate a table, but this would give us the flexibility to manually generate/edit the table for more complex submissions where one peak annotation file has samples that use different LC methods, etc.
Alternatives
Dependencies
This issue cannot be started until the completion of the following issue(s):
703
704
705
Comment
ISSUE OWNER SECTION
Proposal 1 (Rob)
The following section delineates my proposal for handling loading of the
LCMethod
data. It is based on the following observations. A lab member can import any variety of mzXML files intoEL-Maven
and run accucor/isocorr in peaks picked from that process. Those mzXML files can be the product of having used different chromatography methods and different mass spec modes, e.g. neg/pos ion modes. (I'm not sure if the same sample can be included from different modes, but to be safe, I will assume that as well, and that the names of those samples will have suffixes appended, like "_pos" to make their names unique.) Hence, the LCMethod and mass spec modes are specific to the individual header representations of each sample. I.e. there's not one mode per accucor/isocorr file, nor is there one mode per "sample". There is one mode per "header representation of each sample", because each one is related to a single mzXML file.Assumptions
Requirements
TODO
comment)1.1.
and1.2.
must be conditionally required (only necessary if not every metadata value is specified for every sample)1.2.
must be required if the same sample has multiple sample data headers in the peak annotation files included with a sample table / single study loadmsrun_protocol
Protocol
records must be retained by the end of this implementation (until the MS modes have migrated into a database field)Limitations
Affected Components
ms_run.py
load_accucor_msruns.py
accucor_data_loader.py
DESIGN
Interface Change description
New options will be added to the loader. Example of new options:
A new LCMS metadata file will be able to be submitted. Example file:
New error types will be presented in the validation interface.
Code Change Description
New code to process the LCMS metadata csv/xls file will be added to
load_accucor_msruns.py
and passed to the accucor data loader in a manner similar to the processing of the accucor/isocorr files themselves. The metadata will be tracked during processing of the accucor/isocorr file and errors about missing or unused metadata will be buffered and raised en masse. If all LCMethod metadata is supplied, the name will be constructed and records will be created using get_or_create, supplying all data. If only the name is available or no description is provided, the method record will only be retrieved and an error will be buffered/raised if not found. In order to continue with processing, theunknown
LCMethod record will be used until the completion of the full load (same as / consistent with the existing loading mechanisms).Tests
Test that these methods in
lc_method
do what they're supposed to:Test that these methods in
exceptions
do what they're supposed to:Test that these methods in the accucor data loader do what they're supposed to:
Test that these methods in the sample table loader do what they're supposed to:
Test that these methods in the lcms metadata parser do what they're supposed to:
Requirements tests
1.
Entry methods of LCMS Metadata1.1.
Tests1.2.
Test that values missing in the LCMS metadata fall back to the defaults from 1.1.1.3.
LCMS options' requiredness1.3.1.
Tests1.3.2.
Test that the LCMS sample column must correspond to a unique sample in the sample table loader2.
Test that an option/arg exists for multiple mzXML files3.
Test that LCMethod records are created4.
Test that MSRun records link to LCMethod records5.
Testmsrun_protocol
Protocol
records are created6.
Tests