Knowledge-Graph-Hub / kg-obo

A package to transform all OBO ontologies into KGX TSV format and OBO json, and put the transformed graph in KGhub
https://knowledge-graph-hub.github.io/kg-obo/getting_started.html
GNU General Public License v3.0
28 stars 2 forks source link

Standardize node identifiers and prefixes #171

Closed caufieldjh closed 2 years ago

caufieldjh commented 2 years ago

Describe the desired behavior

@LucaCappelletti94 has found that 714,403 nodes across all of KG-OBO aren't among Bioregistry prefixes. There are 695792 unique node ids in this set across 167 ontologies.

Use a prefix remapping to normalize these prefixes where possible. The curies package may help.

Additional context

caufieldjh commented 2 years ago

If we drop the obvious IRIs (as per Bioregistry), there are 476687 nodes which still have non-normalized IDs.

Many are luckily prefixes we can map, e.g. from GNO:

https://gnome.glyomics.org/StructureBrowser.html?focus=G21394OM