enpkg / enpkg_graph_builder

ENPKG graph builder
GNU General Public License v3.0
1 stars 0 forks source link

Canonicalise notation of adducts #5

Closed mpagni12 closed 1 year ago

mpagni12 commented 1 year ago

Currently adducts from different sources cannot be compared, because of the different syntaxes, e.g.

[] enpkg:has_sirius_adduct "[M + H]+" .
[] enpkg:has_adduct "pos_1_1proton1ammonium" .
[] enpkg:has_adduct "NaN"^^xsd:double   # certainly not a double!

and I am aware that GNPS is using yet another syntax.

I would either canonicalise adduct descriptions as string literal, or create instances of an Adduct class. Comparing adducts will certainly be helpfull to evaluate the consistency of different predictions. In addition, at some point "manually curated" adducts can be also introduces to tag the ones that were confirmed. I would suggest something like this for the predicates:

enpkg:has_adduct a rdf:Property .
enpkg:has_sirius_adduct rdfs:subPropertyOf enpkg:has_adduct .
enpkg:has_gnps_adduct rdfs:subPropertyOf enpkg:has_adduct .
enpkg:has_curated_adduct rdfs:subPropertyOf enpkg:has_adduct .
...

and a canonical representation of the adduct.

oolonek commented 1 year ago

Have a look into https://www.ebi.ac.uk/ols/ontologies/ms/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FGO_0070405

Marco comments that the observation of two features with different adducts implies that one annotation is false

Adafede commented 1 year ago

Enventually also look at 6.3.10. adduct_ions section from https://hupo-psi.github.io/mzTab/2_0-metabolomics-release/mzTab_format_specification_2_0-M_release.html#small-molecule-section

mpagni12 commented 1 year ago

Yes, this is an excellent idea to follow an existing specification !

Your link also refers to https://www.degruyter.com/document/doi/10.1351/PAC-REC-06-04-06/html

ArnaudGaudry commented 1 year ago

I added a canonical representation of adducts in 91261aa!