Closed caufieldjh closed 2 years ago
If we drop the obvious IRIs (as per Bioregistry), there are 476687 nodes which still have non-normalized IDs.
Many are luckily prefixes we can map, e.g. from GNO:
https://gnome.glyomics.org/StructureBrowser.html?focus=G21394OM
Describe the desired behavior
@LucaCappelletti94 has found that 714,403 nodes across all of KG-OBO aren't among Bioregistry prefixes. There are 695792 unique node ids in this set across 167 ontologies.
Use a prefix remapping to normalize these prefixes where possible. The
curies
package may help.Additional context
Many of these IDs are from bnodes. That's a different issue.
Many of these IDs are URLs. Some are resolvable to known prefixes:
Some are clearly still IRIs but are more challenging to have a consistent prefix for:
And still others are just everyday URLs:
Some of these are linkml classes:
Some are predicate types and may be present in multiple ontologies, so some more in-depth pattern parsing may be needed.