HUPO-PSI / mzTab

mzTab Reporting MS-based Proteomics and Metabolomics Results
https://hupo-psi.github.io/mzTab
39 stars 17 forks source link

CV Terms for mzTab-m 2.0.0 #108

Closed nilshoffmann closed 5 years ago

nilshoffmann commented 6 years ago

SEP, MS sample_processing

MS instrument_name

MS instrument_source

MS instrument_analyzer

MS instrument_detector

MS software

MS quantification_method

Any CV assay-custom

MS or other CV? study_variable_function

MS ms_run-format

MS ms_run-id_format

MS or other? ms_run-fragmentation_method

MS ms_run-hash_method

Any CV custom -> arbitrary, these should not be validated

NEWT sample-species

BTO sample-tissue

CL sample-cell_type

DOID sample-disease

Any CV sample-custom => custom should not be validated

MIRIAM or other CV? database

MS or chem-mod CV derivatization_agent

MS small_molecule-quantification_unit

MS small_molecule_feature-quantification_unit

PRIDE or other CV small_molecule-identification_reliability MS id_confidence_measure

opt_ columns will not be part of the validation.

MS best_id_confidence_measure

MS identification_method

MS ms_level

MS id_confidence_measure

nilshoffmann commented 6 years ago

@nilshoffmann check https://raw.githubusercontent.com/HUPO-PSI/mzML/master/validator/src/main/resources/ms-mapping.xml as a reference

Paths in the object model need to be translated to xpaths in the mapping file and vice versa.

https://github.com/json-path/JsonPath might be a good starting point.

germa commented 6 years ago

@nilshoffmann: Please check, if the following would fit

SEP sample_processing: child of MS:1000831 ! sample preparation

MS instrument_name: child MS:1000031 ! instrument model

MS instrument_source: child of MS:1000458 ! source

MS instrument_analyzer: child of MS:1000451 ! mass analyzer

MS instrument_detector: child of MS:1000026 ! detector type

MS software: child of MS:1002878 ! small molecule analysis software

PRIDE quantification_method: child of PRIDE:0000307 ! Quantification method

Any CV assay-custom: Maybe only UserParam's ???

MS or other CV study_variable_function: child of MS:1002882 ! study variable average function or child of MS:1002884 ! study variable variation function

MS ms_run-format: child of MS:1000560 ! mass spectrometer file format

MS ms_run-id_format: child of MS:1000767 ! native spectrum identifier format

MS ms_run-fragmentation_method: child of MS:1000044 ! dissociation method

MS ms_run-hash_method: new terms required, e.g. for SPLASH (SPectraL hASH) ???

Any CV custom -> arbitrary, these should not be validated (UserParam)

NEWT sample-species: child of PRIDE:0000033 ! NEWT

BTO sample-tissue: child of BTO:0000000 ! tissues, cell types and enzyme sources or child of PRIDE:0000442 ! Tissue not applicable to dataset

CL sample-cell_type: child of CL:0000000 ! cell

DOID sample-disease: child of DOID:4 ! disease or child of PRIDE:0000018 ! Disease free

Any CV sample-custom => custom should not be validated (UserParam)

MS database: child of MS:1001013 ! database name child of MS:1001347 ! database file formats child of MS:1001011 ! search database details

XLMOD derivatization_agent: TODO: derivatization agents must be added to XLMOD

PRIDE small_molecule-quantification_unit: child of PRIDE:0000392 ! Quantification unit

MS small_molecule_feature-quantification_unit

PRIDE or other CV small_molecule-identification_reliability

MS id_confidence_measure: child of MS:1002888 ! small molecule confidence measure

opt_ columns will not be part of the validation.

MS best_id_confidence_measure

MS identification_method: child of MS:1001080 ! search type

MS ms_level: child of MS:1000511 ! ms level

MS id_confidence_measure: child of MS:1002888 ! small molecule confidence measure

MS MSI levels (Schymanski levels) ???

What are the Schymanski levels? Is there a publication describing them?

and how to encode the isotopomers? Maybe by using a generic term with a value like e.g. [Term] id: MS:1002xyz name: isotopomer def: "An isotopomer." [PSI:PI] xref: value-type:xsd\:string "The allowed value-type for this CV term." is_a: MS:1002xyz ! ...

where the value would be something like "13C peak"

nilshoffmann commented 6 years ago

Done: MS ms_run-format: child of MS:1000560 ! mass spectrometer file format

MS ms_run-id_format: child of MS:1000767 ! native spectrum identifier format

MS ms_run-fragmentation_method: child of MS:1000044 ! dissociation method

MS ms_run-hash_method: new terms required, e.g. for SPLASH (SPectraL hASH) ??? child of MS:1000561 ! data file checksum type

Added: MS ms_run-scan_polarity: MS:1000129 (negative scan) and/or MS:1000130 (positive scan)

nilshoffmann commented 6 years ago

A draft mapping file is available here: https://github.com/nilshoffmann/jmzTab-m/blob/master/validation/src/main/resources/mappings/mzTab-M-mapping.xml

proccaserra commented 6 years ago

@nilshoffmann thx for following up and sending the link.

Following-up on my question about which Vocabularies are recommended, I noticed that study_variable_variation_function_may points to MS_1002882 with one child (standard error)

study_variable_average_function_may points to MS_1002882 with one child (median)

if using STATO, one could point to:

  1. 'measure of variation' (https://www.ebi.ac.uk/ols/ontologies/stato/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FSTATO_0000028), with 5 terms

  2. 'measure of central tendency' (https://www.ebi.ac.uk/ols/ontologies/stato/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FSTATO_0000029) with 11 terms

thus having more options and without having to resubmit terms.

other STATO classes of interest could be: 'effect size estimate' (https://www.ebi.ac.uk/ols/ontologies/stato/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FSTATO_0000085) with 16 subtypes / terms

There are of course many more terms which could be useful. STATO is available from EBI OLS so would work with the service you described during our call.

We can distribute stato in obo format or create modules. https://github.com/ISA-tools/stato

nilshoffmann commented 6 years ago

@proccaserra We can add those as alternative term roots in the mapping file, given that we combine with XOR, so either child of the MS terms or child of the STATO terms.

I will have to check, how the STATO terms are returned by OLS.

proccaserra commented 6 years ago

@nilshoffmann oh I see, nice one. That would be indeed a good way to go about it. thx.

nilshoffmann commented 6 years ago

Preliminary mapping file is available: https://github.com/HUPO-PSI/mzTab/blob/master/specification_document-developments/2_0-Metabolomics-Draft/mzTab_2_0-M_mapping.xml

Currently, XLMOD and STATO are missing from OLS. Once they are added, we can use them in the mapping file.

proccaserra commented 6 years ago

@nilshoffmann STATO is in OLS: https://www.ebi.ac.uk/ols/ontologies/stato

only XLMOD is missing

nilshoffmann commented 6 years ago

@procaserra Sorry, missed that! Thanks for pointing it out!

nilshoffmann commented 5 years ago

All ontologies are now available via OLS.