Closed lparsons closed 4 months ago
Adding documentation is fair, though I would say that it really hasn't changed that much. In fact I would argue that the only additional complexity is "How do I supply the mzXML files?". And that's got a really straightforward answer: the --mzxml-files
option or the mzxml_files yaml item.
And there is documentation in the form of the yaml schema and the commands' help output, which honestly, I think should largely suffice, so a perception of a lack of dcumentation shouldn't really hamper any progress. I have mentioned on a few occasions that all the changes implemented require (with the exception of optionally supplying mzxml files) no different usage of any of the commands unless you get an error - and the errors pretty much tell you what you need to do. Each commands' usage contains sufficient information to add context to the yaml settings which can inform a user how to supply the files.
Script by script, we have:
load_study_set
- The CLI has not changed.load_study
- The CLI has not changed.load_animals_and_samples
- The CLI has a single new optional option, though it is unnecessary unless you get an error. Setting the lcms_file
in the yaml will automatically provide it to both the animal/sample script and the accucor script, and again, you will not need to provide that file in the most common cases (unless you get an error).
$ python manage.py load_animals_and_samples -h
...
--lcms-file LCMS_FILE
Excel or tab-delimited file containing metadata associated with the liquid chromatography and mass spec instrument run, (e.g. DataRepo/data/tests/small_obob_lcms_metadata/glucose.xlsx). If an excel file is used, it will use the sheet named 'LCMS Metadata' or
the first sheet.
load_accucor_msruns
has 6 new options that are fairly straightforward, but only the --mzxml-files
option is necessary to add mzXML files. Each option has a representation in the yaml schema. The mzXML files will get automatically matched to the sample data headers in the accucor file.
$ python manage.py load_accucor_msruns -h
...
--lcms-file LCMS_FILE
Filepath of either an xlsx or csv file containing metadata associated with the liquid chromatography and mass spec instrument run.
--mzxml-files [MZXML_FILES ...]
Filepaths of mzXML files containing instrument run data.
--lc-protocol-name LC_PROTOCOL_NAME
Default LCMethod.name of the liquid chromatography protocol used. Used if --lcms-file is not supplied, or specifies no LC info for a sample.
--instrument INSTRUMENT
Default name of the LCMS instrument that analyzed the samples. Used if --lcms-file is not supplied, or specifies no instrument for a sample.
--polarity POLARITY Default ion mode of the LCMS instrument that analyzed the samples. Used if --lcms-file is not supplied, or specifies no polarity for a sample.
--mz-min MZ_MIN Default unsigned minimum charge of the MSRun scan range. Only required if a study contains multiple MSRuns with the same polarity. Automatically parsed from mzXML. If unavailable, the minimum medMz value from the accucor/isocorr file is acceptable.
--mz-max MZ_MAX Default unsigned maximum charge of the MSRun scan range. Only required if a study contains multiple MSRuns with the same polarity. Automatically parsed from mzXML. If unavailable, the maximum medMz value from the accucor/isocorr file is acceptable.
The other loaders simple had their input file option names changes to --infile
.
Let me know what additional information I can provide.
Thanks for the info on the options. That will be helpful when putting together the documentation. To close this issue , we need to document a process for handling incoming submissions, staging the data, communicating with the researcher, and finally loading into production. Basically, updating our internal docs here: https://nplcadmindocs.princeton.edu/index.php/TraceBase#Processing_TraceBase_Study_Submissions
Putting some notes here:
Update process ready for review/testing at https://nplcadmindocs.princeton.edu/index.php/TraceBase#Processing_TraceBase_Study_Submissions
FEATURE REQUEST
Inspiration
mzXML files are too large to place into a GitHub repository as we have done with previous study data. They are also cumbersome to move around and loading of study has become more complex.
Description
We need a documented process that multiple people understand and can follow to process and eventually load newly submitted studies that include mzXML files.