Closed nilshoffmann closed 5 years ago
@nilshoffmann check https://raw.githubusercontent.com/HUPO-PSI/mzML/master/validator/src/main/resources/ms-mapping.xml as a reference
Paths in the object model need to be translated to xpaths in the mapping file and vice versa.
https://github.com/json-path/JsonPath might be a good starting point.
@nilshoffmann: Please check, if the following would fit
SEP sample_processing: child of MS:1000831 ! sample preparation
MS instrument_name: child MS:1000031 ! instrument model
MS instrument_source: child of MS:1000458 ! source
MS instrument_analyzer: child of MS:1000451 ! mass analyzer
MS instrument_detector: child of MS:1000026 ! detector type
MS software: child of MS:1002878 ! small molecule analysis software
PRIDE quantification_method: child of PRIDE:0000307 ! Quantification method
Any CV assay-custom: Maybe only UserParam's ???
MS or other CV study_variable_function: child of MS:1002882 ! study variable average function or child of MS:1002884 ! study variable variation function
MS ms_run-format: child of MS:1000560 ! mass spectrometer file format
MS ms_run-id_format: child of MS:1000767 ! native spectrum identifier format
MS ms_run-fragmentation_method: child of MS:1000044 ! dissociation method
MS ms_run-hash_method: new terms required, e.g. for SPLASH (SPectraL hASH) ???
Any CV custom -> arbitrary, these should not be validated (UserParam)
NEWT sample-species: child of PRIDE:0000033 ! NEWT
BTO sample-tissue: child of BTO:0000000 ! tissues, cell types and enzyme sources or child of PRIDE:0000442 ! Tissue not applicable to dataset
CL sample-cell_type: child of CL:0000000 ! cell
DOID sample-disease: child of DOID:4 ! disease or child of PRIDE:0000018 ! Disease free
Any CV sample-custom => custom should not be validated (UserParam)
MS database: child of MS:1001013 ! database name child of MS:1001347 ! database file formats child of MS:1001011 ! search database details
XLMOD derivatization_agent: TODO: derivatization agents must be added to XLMOD
PRIDE small_molecule-quantification_unit: child of PRIDE:0000392 ! Quantification unit
MS small_molecule_feature-quantification_unit
PRIDE or other CV small_molecule-identification_reliability
MS id_confidence_measure: child of MS:1002888 ! small molecule confidence measure
opt_ columns will not be part of the validation.
MS best_id_confidence_measure
MS identification_method: child of MS:1001080 ! search type
MS ms_level: child of MS:1000511 ! ms level
MS id_confidence_measure: child of MS:1002888 ! small molecule confidence measure
MS MSI levels (Schymanski levels) ???
What are the Schymanski levels? Is there a publication describing them?
and how to encode the isotopomers? Maybe by using a generic term with a value like e.g. [Term] id: MS:1002xyz name: isotopomer def: "An isotopomer." [PSI:PI] xref: value-type:xsd\:string "The allowed value-type for this CV term." is_a: MS:1002xyz ! ...
where the value would be something like "13C peak"
Done: MS ms_run-format: child of MS:1000560 ! mass spectrometer file format
MS ms_run-id_format: child of MS:1000767 ! native spectrum identifier format
MS ms_run-fragmentation_method: child of MS:1000044 ! dissociation method
MS ms_run-hash_method: new terms required, e.g. for SPLASH (SPectraL hASH) ??? child of MS:1000561 ! data file checksum type
Added: MS ms_run-scan_polarity: MS:1000129 (negative scan) and/or MS:1000130 (positive scan)
A draft mapping file is available here: https://github.com/nilshoffmann/jmzTab-m/blob/master/validation/src/main/resources/mappings/mzTab-M-mapping.xml
@nilshoffmann thx for following up and sending the link.
Following-up on my question about which Vocabularies are recommended, I noticed that study_variable_variation_function_may points to MS_1002882 with one child (standard error)
study_variable_average_function_may points to MS_1002882 with one child (median)
if using STATO, one could point to:
'measure of variation' (https://www.ebi.ac.uk/ols/ontologies/stato/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FSTATO_0000028), with 5 terms
'measure of central tendency' (https://www.ebi.ac.uk/ols/ontologies/stato/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FSTATO_0000029) with 11 terms
thus having more options and without having to resubmit terms.
other STATO classes of interest could be: 'effect size estimate' (https://www.ebi.ac.uk/ols/ontologies/stato/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FSTATO_0000085) with 16 subtypes / terms
There are of course many more terms which could be useful. STATO is available from EBI OLS so would work with the service you described during our call.
We can distribute stato in obo format or create modules. https://github.com/ISA-tools/stato
@proccaserra We can add those as alternative term roots in the mapping file, given that we combine with XOR, so either child of the MS terms or child of the STATO terms.
I will have to check, how the STATO terms are returned by OLS.
@nilshoffmann oh I see, nice one. That would be indeed a good way to go about it. thx.
Preliminary mapping file is available: https://github.com/HUPO-PSI/mzTab/blob/master/specification_document-developments/2_0-Metabolomics-Draft/mzTab_2_0-M_mapping.xml
Currently, XLMOD and STATO are missing from OLS. Once they are added, we can use them in the mapping file.
@nilshoffmann STATO is in OLS: https://www.ebi.ac.uk/ols/ontologies/stato
only XLMOD is missing
@procaserra Sorry, missed that! Thanks for pointing it out!
All ontologies are now available via OLS.
SEP, MS sample_processing
MS instrument_name
MS instrument_source
MS instrument_analyzer
MS instrument_detector
MS software
MS quantification_method
Any CV assay-custom
MS or other CV? study_variable_function
MS ms_run-format
MS ms_run-id_format
MS or other? ms_run-fragmentation_method
MS ms_run-hash_method
Any CV custom -> arbitrary, these should not be validated
NEWT sample-species
BTO sample-tissue
CL sample-cell_type
DOID sample-disease
Any CV sample-custom => custom should not be validated
MIRIAM or other CV? database
MS or chem-mod CV derivatization_agent
MS small_molecule-quantification_unit
MS small_molecule_feature-quantification_unit
PRIDE or other CV small_molecule-identification_reliability MS id_confidence_measure
opt_ columns will not be part of the validation.
MS best_id_confidence_measure
MS identification_method
MS ms_level
MS id_confidence_measure