Open matentzn opened 10 months ago
3 - this is the way
What does (3) entail?
Currently, you can pass a prefix map (currently only non-EPM bimap supported; prefixes.csv
) when creating a SemSQL DB. Are we saying that this prefix map can have additional entries not already in the OAK context so long as there is no conflict (i.e. a URI prefix which is assigned in the to a different prefix than OAK has assigned)?
Couldn't we just interpret such conflicts as prefix synonyms and maybe throw a warning to the user?
prefixes.csv
is actually a "one way epm in disguise" the same prefix can be mapped to multiple URL prefixes. See my comments in #699 for what I think the best solution would be. The key issue here is not the EPM - it is that prefix assumptions are hardcoded in the code. All entities in the code should be cycled through a standard epm before being used (say, "curies.Converter.standardise("oio:hasDbXref")" or something similar. Ideally, --epm can always be passed in to all oak commands to replace the default epm, which re-serialises the built-in curies prior to usage.
Having pieces of code like https://github.com/INCATools/ontology-access-kit/blob/d139e99fe7faa109e0b71840e20140852a8267d9/src/oaklib/utilities/lexical/lexical_indexer.py#L52, and I think just searching there are a number of cases in OAK where these are occur, seems dangerous to me. @joeflack4 just uncovered a case where we passed in a oboInOwl prefix to semsql, which resulted in lexmatch no longer being able to understand that
oboInOwl:hasExactSynonym
(which was used in the ontology) is, in fact, the same asoio:hasExactSynonym
. There are various ways to solve this problem:None of this is particularly easy - (3) is probably easiest, but we would have to give some tool support, like
runoak normalise-prefixes -i ont.db
.