MassBank / MassBank-web

The web server application and directly connected components for a MassBank web server
13 stars 22 forks source link

CV terms in record spec #361

Open meier-rene opened 1 year ago

meier-rene commented 1 year ago

Hi all, we have a request from @michaelwitting for the support of EAD as a value for AC$MASS_SPECTROMETRY: 'FRAGMENTATION_MODE. I would like to use this opportunity to push the usage of controlled vocabulary in MassBank. We had some brainstorming about the best way to get this in and here is our proposal for an extension of the format spec: Ontology terms will be specified in the same way like in the mzTab format. A quote from the mzTab spec:

Parameters are always reported as [CV label, accession, name, value]. Any field that is not available MUST be left empty.

[MS, MS:1001477, SpectraST,]

Should the name of the param contain commas, quotes MUST be added to avoid problems with the parsing: [label, accession, “first part of the param name, second part of the name”, value].

[MOD, MOD:00648, "N,O-diacetylated L-serine",]

Its most probably backward compatible to other software packages.

In the case of Michaels request #359 AC$MASS_SPECTROMETRY: FRAGMENTATION_MODE EAD would become AC$MASS_SPECTROMETRY: FRAGMENTATION_MODE [MS, MS:1003294, electron activated dissociation,] or if we also allow the official synonym from the ontology AC$MASS_SPECTROMETRY: FRAGMENTATION_MODE [MS, MS:1003294, EAD,]

This does not mean that we need to have this "not so human friendly" representation on the final web page. This can easily be transformed to something like: AC$MASS_SPECTROMETRY: FRAGMENTATION_MODE EAD for html(please note that EAD is new in the ontology and not distributed to all the search engines, thats why I use a different term as link).

Benefits of CV terms are unified use of terms, clear meaning and, if properly implemented, automatic extension of allowed terms by all CV terms.

Any objections, comments? If not, I will pretty soon start with the implementation of this. I expect it will be a smooth change without any breaks, because its just a addition no real change. Hopefully I can convince Michael to be our guinea pig for this addition with his contribution :wink:

Regards, Rene

sneumann commented 1 year ago

For the validation, there is also some previous work to map which "corners" of the ontology are valid for specific fields. Such a mapping file is https://github.com/HUPO-PSI/mzML/blob/master/validator/src/main/resources/ms-mapping.xml and a more human-friendly version can be obtained with OpenMS tools that give a HTML file like https://msbi.ipb-halle.de/~sneumann/mzML_mapping_and_cv.html (which goes further to also specify whether a term can or has to be present).

sneumann commented 1 year ago

Not mentioned above clearly is that the mzTab people also put values in: [MS, MS:1001582, XCMS, 2.99.6] (<- version number) While in mzML such CV parameters can also include a unit as in: <"MS", "MS:1000927", "ion injection time" value="200.0", "UO", "UO:0000028", "millisecond"> units are not supported in the square bracket [] notation. Instead, mzTab-M specifies units elsewhere like [UO, UO:0000010, second, ].

meier-rene commented 1 year ago

I think an possible solution could be [MS, MS:1000927, ion injection time, 200.0],[UO, UO:0000028, millisecond,]

meier-rene commented 1 year ago

This is now in the record spec. https://github.com/MassBank/MassBank-web/blob/dev/Documentation/MassBankRecordFormat.md#controled-vocabulary-terms