Knowledge-Graph-Hub / kg-microbe

https://knowledge-graph-hub.github.io/kg-microbe/index.html
BSD 3-Clause "New" or "Revised" License
16 stars 3 forks source link

map metabolism / use of oxygen (aerobic etc) column in big trait table #3

Closed cmungall closed 3 years ago

cmungall commented 3 years ago

Split from #2

1 strictly anaerobic 342 obligate anaerobic 544 microaerophilic 1035 obligate aerobic 2328 anaerobic 2655 facultative 3108 aerobic 4250 NA

I think these should map to ecocore, cc @diatomsRcool

hrshdhgd commented 3 years ago

@cmungall : Would the metabolism column need to run through OGER to get tagged by ecocore CUIs ?

cmungall commented 3 years ago

I would not use oger here. There are 8 distinct values - it should take a few minutes to manually curate a mapping table that can be used. you may need to request any terms missing from ecocore

cmungall commented 3 years ago

@realmarcin noted that there is an ontology in bioportal that has the terms we need:

https://bioportal.bioontology.org/ontologies/MIXSCV?p=classes&conceptid=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FMVC_0000445

This ontology is an (abandoned?) translation of MIxS to OWL.

Note that in NMDC @wdduncan is working on a semantic translation of MIxS. We use biolinkML as the representation. See https://github.com/microbiomedata/nmdc-metadata/blob/master/schema/mixs.yaml

E.g.

https://github.com/microbiomedata/nmdc-metadata/blob/d27b4f6784af5f4508833e6562f316c25fb37a81/schema/mixs.yaml#L400-L412

However, in MIxS the list of possible values is just an enumeration of strings.

With @kaiam we were originally going to be working on mapping each value in that enum - I would still like to prioritize this for key fields

we are reliant on this feature to bring it in to our yaml (and Kai's mapping) https://github.com/biolink/biolinkml/issues/170

However, for kg-microbe we can proceed independently of mixs, but we still need the mapping of strings to ecocore terms

hrshdhgd commented 3 years ago

This is what I have found thus far:

anaerobe: http://purl.obolibrary.org/obo/ECOCORE_00000172 anaerobic respiration: http://purl.obolibrary.org/obo/GO_0009061 facultative anaerobe: http://purl.obolibrary.org/obo/OMP_0000087 obligately anaerobic: http://purl.obolibrary.org/obo/MICRO_0000504

aerobe: http://purl.obolibrary.org/obo/ECOCORE_00000173 aerobic respiration: http://purl.obolibrary.org/obo/GO_0009060 obligately aerobic: http://purl.obolibrary.org/obo/MICRO_0000516

Could not find 'microaerophilic'. Would 'strictly anaerobic' and 'anaerobic' be clubbed together?

cmungall commented 3 years ago

This is a really good example of why we need more harmonization in OBO! Ideally there would be one ontology to use here, not 4!

hrshdhgd commented 3 years ago

I would not use oger here. There are 8 distinct values - it should take a few minutes to manually curate a mapping table that can be used. you may need to request any terms missing from ecocore

Here's what the mapping table looks like:

ID ActualTerm PreferredTerm
ECOCORE:00000172 anaerobic anaerobe
MICRO:0000504 obligate anaerobic obligately anaerobic
OMP_0000087 facultative facultative anaerobe
MICRO:0000516 obligate aerobic obligately aerobic
ECOCORE:00000173 aerobic aerobe
MICRO:0000515 microaerophilic microaerophilic

'ActualTerm' is what exists in the data and 'PreferredTerm' is what exists in the corresponding ontology.

wdduncan commented 3 years ago

I found microaerophilic here: https://www.ebi.ac.uk/ols/ontologies/micro/terms?iri=http%3A%2F%2Fpurl.obolibrary.org%2Fobo%2FMICRO_0000515

microaerophilic -> http://purl.obolibrary.org/obo/MICRO_0000515

hrshdhgd commented 3 years ago

Updated the table. Thanks @wdduncan !