Princeton-LSI-ResearchComputing / tracebase

Mouse Metabolite Tracing Data Repository for the Rabinowitz Lab
MIT License
4 stars 1 forks source link

Document process to load studies with mzXML files #859

Closed lparsons closed 4 months ago

lparsons commented 5 months ago

FEATURE REQUEST

Inspiration

mzXML files are too large to place into a GitHub repository as we have done with previous study data. They are also cumbersome to move around and loading of study has become more complex.

Description

We need a documented process that multiple people understand and can follow to process and eventually load newly submitted studies that include mzXML files.

hepcat72 commented 5 months ago

Adding documentation is fair, though I would say that it really hasn't changed that much. In fact I would argue that the only additional complexity is "How do I supply the mzXML files?". And that's got a really straightforward answer: the --mzxml-files option or the mzxml_files yaml item.

And there is documentation in the form of the yaml schema and the commands' help output, which honestly, I think should largely suffice, so a perception of a lack of dcumentation shouldn't really hamper any progress. I have mentioned on a few occasions that all the changes implemented require (with the exception of optionally supplying mzxml files) no different usage of any of the commands unless you get an error - and the errors pretty much tell you what you need to do. Each commands' usage contains sufficient information to add context to the yaml settings which can inform a user how to supply the files.

Script by script, we have:

The other loaders simple had their input file option names changes to --infile.

Let me know what additional information I can provide.

lparsons commented 5 months ago

Thanks for the info on the options. That will be helpful when putting together the documentation. To close this issue , we need to document a process for handling incoming submissions, staging the data, communicating with the researcher, and finally loading into production. Basically, updating our internal docs here: https://nplcadmindocs.princeton.edu/index.php/TraceBase#Processing_TraceBase_Study_Submissions

lparsons commented 5 months ago

Putting some notes here:

  1. Ideal to ask for sample sheet, a single accucor file, and all mzXML files in one directory.
  2. Download files to staging area on tracebase-dev
  3. Load sample sheet, fix errors, reload until success
  4. Use dry-run to load accucor file and mzXML files (should not be copied during dry run), fix errors, repeat until all issues resolved
  5. Compile YAML file with necessary options, test loading in dev (can this be done after it's already loaded?), mark as ready to load
  6. Use YAML file to load into production from staging area, mark as loaded
  7. Script to cleanup marked directories and notify about old ones (over 1 week?)
lparsons commented 5 months ago

Update process ready for review/testing at https://nplcadmindocs.princeton.edu/index.php/TraceBase#Processing_TraceBase_Study_Submissions