Open bgyori opened 5 months ago
This has come up many times and I have just spent some more time thinking about this. There are a few possible solutions:
uri_format
in the NCBI Taxonomy record. This will get the job done, but have the drawback that the default exported Bioregistry prefix map will then have a non-OBO PURL in it. In the past, not having OBO PURLs show up in all places has been a point of friction for adoption by the OBO community, and changing this would probably deteriorate trusturi_format
to the Bioregistry data model that is only considered during resolution. This might also motivate having a dichotomy between functions for getting IRIs and for getting URLs that bake in some assumptions about what qualities the results have. This will increase complexity for both curators and maintainers to understand the data model, and decide where this value should get consideredCode that counts the number of URI format string annotations:
import bioregistry
total = len(bioregistry.resources())
count = sum(r.uri_format is not None for r in bioregistry.resources())
print(f"There are {count}/{total} ({count/total:.1%}) records with explicit URI format strings")
If NCBI could way in, we could probably change the resolver of the OBO PURL to NCBI resource.. Its a bit awkward as some people might expect information about the the ontology when looking up this information, but probably its ok.
What is the concern to do the same as done for NCIT?
"uri_format": "https://ncit.nci.nih.gov/ncitbrowser/ConceptReport.jsp?dictionary=NCI%20Thesaurus&code=$1",
"uri_format_rdf": "http://purl.obolibrary.org/obo/NCIT_$1"
Is it that tooling (curies) does not respect the uri_format_rdf
slot?
In most applications it would be useful to resolve
ncbitaxon
IDs to the NCBI's website, e.g., https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=9606 as the primary provider of these IDs. Currently, https://bioregistry.io/ncbitaxon:9606 first resolves to http://purl.obolibrary.org/obo/NCBITaxon_9606 and then to https://ontobee.org/ontology/NCBITaxon?iri=http://purl.obolibrary.org/obo/NCBITaxon_9606, a third party provider. I suspect that the choice of using purl here is motivated by URI-based identification rather than web-based resolution concerns. Still, could we make the NCBI website the default resolver?