Currently, the return type definitions for .entities() methods are inaccurate. it says Iterable[CURIE], but it is also possible for it to return "quoted URIs" (example: <http://identifiers.org/hgnc/10004>).
Possible solutions
a. Fix so that URIs, not quoted URIs are returned, and update type def Union[CURIE, URI]
b. Add new type for quoted URIs and update type def Union[CURIE, QUOTED_URI[
c. IDENTIFIER
rename CURIE to IDENTIFIER, such that this is used universally in signatures. This will be a large diff, but it will not affect runtime
Formally define this as IDENTIFIER = CURIE | QUOTED_URI. Again, this is typing information that does not affect runtime, but it helps make the intentions clear
Additional details
Context / discussion
_[Chris](https://github.com/INCATools/ontology-access-kit/pull/688#issuecomment-1863151973)_:
> The intent is that CURIEs are input and output. Yes, I know that when using semsql URIs that have no prefixes get turned into quoted URIs (different from URIs), but this should be fixed elsewhere.
_[Joe](https://github.com/INCATools/ontology-access-kit/pull/688#issuecomment-1863524743):_
> I guess you're right, they are "quoted URIs". Here's an example of one I see:
> > `''`
>
> Why don't we add something like QUOTED_URI to `oaklib.types`?:
> ```py
> CURIE = str
> QUOTED_URI = str
> URI = str
> ```
>
> When you say "should be fixed elsewhere", I assume you are referring to them being "quoted" URI strings rather than plain URI strings. I'm definitely down with that. If you mean that URIs should never be returned (you probably don't), I don't think that's possible 100% of the time.
Overview
Currently, the return type definitions for
.entities()
methods are inaccurate. it saysIterable[CURIE]
, but it is also possible for it to return "quoted URIs" (example:<http://identifiers.org/hgnc/10004>
).Possible solutions
a. Fix so that URIs, not quoted URIs are returned, and update type def
Union[CURIE, URI]
b. Add new type for quoted URIs and update type defUnion[CURIE, QUOTED_URI[
c.IDENTIFIER
CURIE
toIDENTIFIER
, such that this is used universally in signatures. This will be a large diff, but it will not affect runtimeIDENTIFIER = CURIE | QUOTED_URI
. Again, this is typing information that does not affect runtime, but it helps make the intentions clearAdditional details
Context / discussion
_[Chris](https://github.com/INCATools/ontology-access-kit/pull/688#issuecomment-1863151973)_: > The intent is that CURIEs are input and output. Yes, I know that when using semsql URIs that have no prefixes get turned into quoted URIs (different from URIs), but this should be fixed elsewhere. _[Joe](https://github.com/INCATools/ontology-access-kit/pull/688#issuecomment-1863524743):_ > I guess you're right, they are "quoted URIs". Here's an example of one I see: > > `''`
>
> Why don't we add something like QUOTED_URI to `oaklib.types`?:
> ```py
> CURIE = str
> QUOTED_URI = str
> URI = str
> ```
>
> When you say "should be fixed elsewhere", I assume you are referring to them being "quoted" URI strings rather than plain URI strings. I'm definitely down with that. If you mean that URIs should never be returned (you probably don't), I don't think that's possible 100% of the time.
Related
684