biopragmatics / bioregistry

📮 An integrative registry of biological databases, ontologies, and nomenclatures.
https://bioregistry.io
MIT License
107 stars 47 forks source link

Invalid IRIs in results of the Bioregistry SPARQL endpoint #803

Open hartig opened 1 year ago

hartig commented 1 year ago

To reproduce the issue run the following query on the SPARQL endpoint (https://bioregistry.io/sparql).

PREFIX owl: <http://www.w3.org/2002/07/owl#>

SELECT ?o WHERE {
    <http://identifiers.org/ensembl/ENSG00000006125> owl:sameAs ?o
}

The result contains invalid IRIs such as http://bacteria.ensembl.org/[?species_name]/Gene/Summary?g=ENSG00000006125.

I discovered this issue when trying to issue such queries from a program that is implemented based on the Jena library. In particular, when trying to print the result of this query, Jena throws the following exception.

<http://bacteria.ensembl.org/[?species_name]/Gene/Summary?g=ENSG00000006125> Code: 0/ILLEGAL_CHARACTER in PATH: The character violates the grammar rules for URIs/IRIs.
org.apache.jena.irix.IRIException: <http://bacteria.ensembl.org/[?species_name]/Gene/Summary?g=ENSG00000006125> Code: 0/ILLEGAL_CHARACTER in PATH: The character violates the grammar rules for URIs/IRIs.
cthoyt commented 1 year ago

hi @hartig, thanks for letting us know about this and including an example.

This might be something coming in to the bioregistry from Prefix Commons. We can either fix this in the way the bioregistry loads the URI prefixes into the curies data structure, or directly upstream in the curies package. I'm at the Biocuration 2023 conference now but will try and address this by the end of the week.