FAIRplus / FAIRPlus_squad2

an internal issue tracker (=todo list) for Squad team 2
3 stars 0 forks source link

Convert Proteomics Thermofischer Raw data to mzML open format #23

Open proccaserra opened 5 years ago

proccaserra commented 5 years ago

@ulo please see: https://github.com/compomics/ThermoRawFileParser

the tool is described in the following manuscript (preprint) https://www.biorxiv.org/content/10.1101/622852v1.full

it would help with the release of the dataset public and for future

ulo commented 5 years ago

Thanks for the input. Yes, the currently community-accepted open format for proteomics is mzML. In the past, I have used msconvert (http://proteowizard.sourceforge.net/tools/msconvert.html) for this conversion step. I think we should publish the open-format files in addition to the proprietary raw format.

mcourtot commented 4 years ago

@proccaserra sent info on tools - @sedlyarov will do the conversion and submit to proteomXchange. @ulo looking into accompanying metadata for submission

ulo commented 4 years ago

I submitted the proteomics data to the public ProteomeXchange repository, and made some interesting observations regarding requirements on file formats and metadata:

  1. This repository requires not the raw files in an open format (which would be mzML; as also stated by @proccaserra) but the result files. As we used the ProteomeDiscoverer software, the proprietary result file format is *.pdResult. Fortunately, the software can also export the required mzID format.

  2. Next to generic metadata on the whole data set (project title & description, keywords, sample & data processing protocol), they also require a number of more specific annotations on the sample and method:

    • species: NCBITAXON ontology
    • tissue: BTO / EFO ontology
    • instrument: MS ontology
    • cell type: CL / EFO ontology
    • disease: EFO / DOID ontology
    • quantification method: PRIDE ontology
  3. Interestingly, the actual experimental factors are entered in a free text field, not requiring any structure or ontology.