Closed matentzn closed 1 year ago
spoke to @hrshdhgd about how to do this, StreamingSssomWriter can do a self.ontology_interface.prefix_map
to get the map
byt remember many ontologies are not well-behaved and do not contain prefix expansions for all xrefs. But here the user can choose to pass this in as a global option
e.g.
runoak --named-prefix-map bioregistry -i sqlite:obo:uberon mappings -O sssom > tmp/anatomy-mapping-uberon.tsv
however we still don't expect this to work for many prefixes
✗ runoak --named-prefix-map bioregistry prefixes FMA MBA BIRNLEX SCTID Wikipedia neuronames
key val
FMA http://purl.obolibrary.org/obo/FMA_
SCTID http://snomed.info/id/
I think the default should still be to emit a mapping and have an optional strict mode
As of now clean_prefix_map()
is strict and errors out if prefixes used in the dataframe do exist in the curie_map
. Should I add a flag strict : bool [default=True]
for clean_prefix_map()
in sssom-py
?
Created a PR: https://github.com/mapping-commons/sssom-py/pull/353
check with @matentzn but I think we need to allow this - we need to be able to extract messy unregistered xrefs from ontologies into sssom to analyze them in order to make progress on registering them. alternatively you can just inject into some kind of global unknown namespace
The general thought "allow messy prefixes to be parsed" I guess is practical, but I am not sure this is a SSSOM problem - shouldn't the parser that registers the prefix guess, or randomly generates, a prefix map entry?
You could:
self.ontology_interface.prefix_map
where they existBIRNLEX: http://w3id.org/oak/unknown_prefixes/birnlex
The ontology parsing process basically builds up a prefix map which can then be used to finalise the SSSOM data frame.
I do not think we should permit missing prefix declarations in the curie_map
- that is too much wild west and will just result in underspecificed mapping files. I hope this makes sense, if not, lets clarify!
You could:
- Take the prefixes from self.ontology_interface.prefix_map where they exist
- Where they don't, check the sssom default context, or bioregistry directly
- Where neither works, just generate a randomised prefix entry BIRNLEX: http://w3id.org/oak/unknown_prefixes/birnlex
Do we do this in oaklib
or sssom-py
?
I would say oak. We already do something like that in SSSOM but I think the burden should be on the parser to try to interpret the ontology
On Wed, Mar 15, 2023 at 3:34 AM Nico Matentzoglu @.***> wrote:
The general thought "allow messy prefixes to be parsed" I guess is practical, but I am not sure this is a SSSOM problem - shouldn't the parser that registers the prefix guess, or randomly generates, a prefix map entry?
You could:
- Take the prefixes from self.ontology_interface.prefix_map where they exist
Oh of course we should always do this
1.
- Where they don't, check the sssom default context, or bioregistry directly
I’d prefer this more explicitly under user control and to only use the semantic subset in prefix maps. Generating an unstable prefix map with cgi bin .pl expansions is worse than 3 below
1.
- Where neither works, just generate a randomised prefix entry BIRNLEX: http://w3id.org/oak/unknown_prefixes/birnlex
I like this
The ontology parsing process basically builds up a prefix map which can then be used to finalise the SSSOM data frame.
I do not think we should permit missing prefix declarations in the curie_map - that is too much wild west and will just result in underspecificed mapping files. I hope this makes sense, if not, lets clarify!
— Reply to this email directly, view it on GitHub https://github.com/INCATools/ontology-access-kit/issues/462#issuecomment-1469755624, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAMMOI7XY3ESXBKBX6APKTW4GLMBANCNFSM6AAAAAAVWOXYIU . You are receiving this because you commented.Message ID: @.***>
Ok, so:
Its not what we do in SSSOM toolkit for now, but I am fine with this option!
So the strict
flag in clean_prefix_map()
is good or no?
In terms of the bullets above, if I understand correctly:
--prefix
option for the oaklib
CLI which I assume is where the user passes the prefixes. It is not used in the mappings
function at the moment. I'll need to implement this, correct?
writer.emit()
which makes the code messy. Am I missing something here?curie_map
after clean_prefix_mao()
in sssom
, I implement the unknown prefix URI in oaklib
? This looks too shabby of things being distributed in my opinion.You are right @hrshdhgd I didnt think this through.
What do you think about adding an else clause here:
That takes the unknown prefixes and adds them to the mapping set, using http://w3id.org/sssom/unknown_prefix/birnlex/
as a value? Just thinking out loud here
The source for my confusion is where do we need to add this functionality: oaklib
or sssom-py
Dont worry, you are not tthe only confused person - I don't know yet, that's why I am bouncing around.
If we go with my suggestion above, we could do it in sssom py. Basically in non-strict mode, we just assume we should generate "unknown prefix" IRIs. I think this makes sense, also for the obographs parser in sssom toolkit.
But it may lead to quite a few changes to sssom-py, especially in the parse CLI (and perhaps convert). One step at a time.
Would the https://github.com/mapping-commons/sssom-py/pull/353 PR solve the immediate problem?
For now yes, it solves the immediate problem. My only question there is should the default be True
or False
for the strict
flag.
I think it should be true
by default.
Results in a lot of mappings, but no
curie_map
.