Open jakebeal opened 1 year ago
I think what you are asking for can be accomplished by the following:
uri = tyto.URI('https://identifiers.org/SO:0000167', tyto.SO)
Unfortunately, that does not seem to be the case:
>>> tyto.URI('http://identifiers.org/so/SO:0000316', tyto.SO)
'http://identifiers.org/so/SO:0000316'
>>> tyto.URI('https://identifiers.org/SO:0000316', tyto.SO)
'https://identifiers.org/SO:0000316'
>>> tyto.URI('https://nonsense_uri', tyto.SO)
'https://nonsense_uri'
Is this what you are looking for?
>>> promoter = tyto.SO.promoter
>>> promoter
'https://identifiers.org/SO:0000167'
>>> tyto.SO._sanitize_uri(promoter)
'http://purl.obolibrary.org/obo/SO_0000167'
>>> tyto.SO._reverse_sanitize_uri('http://purl.obolibrary.org/obo/SO_0000167')
'https://identifiers.org/SO:0000167'
That's looking along the right lines, but I'm still a bit mystified, because _sanitize_uri
is a) not caring if it's part of the ontology or not, and b) not returning the same URI that gets returned when I look up terms.
>>> tyto.SO._sanitize_uri('https://identifiers.org/SO:0000316')
'http://purl.obolibrary.org/obo/SO_0000316'
>>> tyto.SO.get_uri_by_term('promoter')
'https://identifiers.org/SO:0000167'
>>> tyto.SO._sanitize_uri('https://nonsense.uri')
'https://nonsense.uri'
Is there any function that I can give 'http://identifiers.org/so/SO:0000316'
, and it gives me the same result as get_uri_by_term
(e.g., in this case 'https://identifiers.org/SO:0000167'
?
tyto.SO._reverse_sanitize_uri
is a natural place to tuck this functionality. Currently it recognizes a purl namespace and converts it back to identifiers.org. It could also be extended to normalize from URIs with the pattern "'http://identifiers.org/so/".
From an SBOL perspective, I think your natural inclination would be to assume that the _sanitize
method would return a URI in identifiers.org namespace. That is not the case. The logic behind _sanitize
and _reverse_sanitize
is that the query builder has to normalize (sanitize) a URI to a purl namespace in order to query the ontology servers (they recognize purl, not identifiers.org, which makes me question why SBOL chose to normalize on identifiers.org). Likewise, the ontology servers will return URIs in purl namespace, so they have to be "reverse sanitized" back into identifiers.org. The query builder typically does this under the hood, so the methods are private.
In any case, I could go ahead and implement a public normalize
function with the functionality you requested, although, as noted above, it's a bit of a misnomer since all the ontology resources normalize on purl namespace.
Whatever makes sense under the hood is fine by me. The key that I need is for the results of tyto.ontology.get_uri_from_term()
and tyto.ontology.normalize(uri)
to be equal.
Implementing that function would be great! You can currently find my workaround version in the SBOL utilities workarounds at https://github.com/SynBioDex/SBOL-utilities/blob/2b8d6289cf2ed818deb95a34b27d7ea25567982c/sbol_utilities/workarounds.py#L24-L37
Do you want it to throw an error if the given URI is not a member of the ontology, e.g., https://nonsense.uri ?
I'm fine with either throwing a lookup exception or returning None. For my first specific use case, it would be a little more convenient if it returned None, but I can make it work either way, so I think you should do what you think makes most sense from a tyto-centric perspective.
Maybe you could even have it be an optional argument to switch between the two behaviors that defaults to throwing an exception, but can be overridden to return none instead (sort of like directory creating has the exists_ok
option).
I often want to put a URI into "normal form", i.e., the recommended form. Currently, this is done by
tyto.X.get_uri_by_term(tyto.X.get_term_by_uri(term))
It would be nice to have normalization as an efficient convenience method.