Knowledge-Graph-Hub / universalizer

The KG-Hub Universalizer provides functions for knowledge graph cleanup and identifier normalization.
BSD 3-Clause "New" or "Revised" License
3 stars 2 forks source link

Implement prefix normalization #2

Closed caufieldjh closed 1 year ago

caufieldjh commented 1 year ago

We're going to have to do prefix normalization again, so perhaps it should:

So why not just use the functions in curies and prefixmaps?

The IRIs/CURIEs used in KG-OBO and other graph collections (see KG-Bioportal) are noisy - we anticipate any ID needs to undergo the following checks:

  1. Is this an IRI or a CURIE?
  2. If it's an IRI, do we have a corresponding CURIE prefix?
  3. If it's a CURIE, is it using our preferred prefix?
  4. If we can't find a corresponding prefix, is this an alternative form? This requires one-to-many maps from IRI forms to CURIE prefixes plus lists of alternative prefixes. Alternatively, some cases (e.g., http vs. https; invalid prefixes like file:) may be handled outside the maps.
justaddcoffee commented 1 year ago

I'm very curious on @cmungall 's thoughts here - we discussed in the KG construction meeting, but I don't think reached a conclusion