Open sierra-moxon opened 2 years ago
This would be particularly useful when processing ontologies like the EFO that use a lot of OBO components but also mix in their own classes. e.g. right now, if you run EFO through the generic obo graph loader it mostly works but you end up with a bunch of nodes categorized as OntologyClass as you describe. For this ontology, you wouldn't need to use bioportal because it holds most (maybe all?) of the mappings needed as internal xrefs. e.g. EFO:0006505 'chronic bronchitis' is a child of 'disease' EFO:0000408 which has an xref to MONDO:0000001 .
Notably EFO, like many ontologies, doesn't maintain a precise mapping from namespace to upper semantic type so the current categorization method wouldn't work.
Leveraging the hard fought internal semantics and ontology mapping efforts to do a better job auto-categorizing kg nodes seems like a very good idea that would help expand the power of kgx as well as the overall value of the OBO effort.
Issue now related to https://github.com/biolink/kgx/issues/416?
At the moment we have the ability to properly type some ontologies with their corresponding BM categories (e.g. MONDO classes get typed with biolink:Disease https://github.com/biolink/kgx/blob/b4e1e5ec3299885d1834d9a04b9f07f38690feae/kgx/source/obograph_source.py#L262).
When a new ontology acts as a source, we often get very high level Biolink classes (biolink:NamedThing, etc.). One idea is to use the mappings in Bioportal to help KGX assign the correct Biolink Model category.
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2815474/ https://data.bioontology.org/mappings?ontologies=SNOMEDCT,NCIT&display_context=false&display_links=false