Knowledge-Graph-Hub / universalizer

The KG-Hub Universalizer provides functions for knowledge graph cleanup and identifier normalization.
BSD 3-Clause "New" or "Revised" License
3 stars 2 forks source link

Some OBO IDs are parsed incorrectly #13

Closed caufieldjh closed 2 years ago

caufieldjh commented 2 years ago

Current behavior is mangling OBO ids. See this selection of ID remappings it attempted to make for OGSF:

Old ID  New ID
OBO:OGI.owl#CpG_Island  CPG:ISLAND
OBO:OGI.owl#Biological_Macromolecule    BIOLOGICAL:MACROMOLECULE
OBO:OGI.owl#isLocatedAfter  ISLOCATEDAFTER
OBO:so_variant_of   SO:VARIANT:OF
OBO:OGI.owl#Genomic_DNA GENOMIC:DNA
OBO:OGI.owl#Polymer POLYMER
OBO:OGI.owl#Polypeptide POLYPEPTIDE
OBO:OGI.owl#Primary_Transcript  PRIMARY:TRANSCRIPT
OBO:OGI.owl#isAdjacentBefore    ISADJACENTBEFORE
OBO:Contig3 CONTIG3
OBO:Contig4 CONTIG4
OBO:OGI.owl#isAdjacentAfter ISADJACENTAFTER
OBO:Gap4    GAP4
OBO:Gap2    GAP2
OBO:Gap3    GAP3
OBO:OGI.owl#Transcript  TRANSCRIPT
OBO:REO_0000826 REO:0000826
OBO:so_has_part SO:HAS:PART

OBO:REO_0000826 is parsed as expected.

OBO:Contig3 shouldn't be here so there are a few options, including leaving it unchanged.

OBO:OGI.owl#CpG_Island is referring to OGI, so this could be OGI:CpG_Island, but that won't resolve back to anything - its ID in OGI.owl is still http://purl.obolibrary.org/obo/OGI.owl#CpG_Island. So we can either leave it unchanged or assign it the new project-specific ID.

Underscores are converted to colons, which is normally fine as long as we're looking at a CURIE already.

caufieldjh commented 2 years ago

See also, incorrect predicate remapping:

OBO:chebi#is_conjugate_base_of  CHEBI#IS:CONJUGATE:BASE:OF
caufieldjh commented 2 years ago

Fixed by #16