Closed micheldumontier closed 1 year ago
@micheldumontier While investigating this issue I came across a weird entry in the Bioregistry database.
When executing the following query with HeFQUIN ...
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT * WHERE {
SERVICE <https://bioregistry.io/sparql> {
<http://identifiers.org/ensembl/ENSG00000006125> owl:sameAs ?o
}
}
... HeFQUIN throws the following exception.
<http://bacteria.ensembl.org/[?species_name]/Gene/Summary?g=ENSG00000006125> Code: 0/ILLEGAL_CHARACTER in PATH: The character violates the grammar rules for URIs/IRIs.
org.apache.jena.irix.IRIException: <http://bacteria.ensembl.org/[?species_name]/Gene/Summary?g=ENSG00000006125> Code: 0/ILLEGAL_CHARACTER in PATH: The character violates the grammar rules for URIs/IRIs.
at org.apache.jena.irix.IRIProviderJenaIRI.exceptions(IRIProviderJenaIRI.java:256)
at org.apache.jena.irix.IRIProviderJenaIRI.newIRIxJena(IRIProviderJenaIRI.java:137)
at org.apache.jena.irix.IRIProviderJenaIRI.create(IRIProviderJenaIRI.java:145)
at org.apache.jena.irix.IRIx.create(IRIx.java:54)
at org.apache.jena.sparql.util.FmtUtils.abbrevByBase(FmtUtils.java:475)
at org.apache.jena.sparql.util.FmtUtils.stringForURI(FmtUtils.java:460)
at org.apache.jena.sparql.util.FmtUtils.stringForURI(FmtUtils.java:433)
at org.apache.jena.sparql.util.FmtUtils.stringForNode(FmtUtils.java:373)
at org.apache.jena.sparql.util.FmtUtils.stringForNode(FmtUtils.java:347)
at org.apache.jena.sparql.util.FmtUtils.stringForRDFNode(FmtUtils.java:185)
at org.apache.jena.riot.resultset.rw.ResultSetWriterText.getVarValueAsString(ResultSetWriterText.java:201)
at org.apache.jena.riot.resultset.rw.ResultSetWriterText.colWidths(ResultSetWriterText.java:99)
at org.apache.jena.riot.resultset.rw.ResultSetWriterText.output$(ResultSetWriterText.java:135)
at org.apache.jena.riot.resultset.rw.ResultSetWriterText.output(ResultSetWriterText.java:120)
at org.apache.jena.riot.resultset.rw.ResultSetWriterText.output(ResultSetWriterText.java:116)
at org.apache.jena.riot.resultset.rw.ResultSetWriterText.write(ResultSetWriterText.java:59)
at org.apache.jena.riot.resultset.rw.ResultsWriter.write(ResultsWriter.java:156)
at org.apache.jena.riot.resultset.rw.ResultsWriter.write(ResultsWriter.java:126)
at org.apache.jena.sparql.util.QueryExecUtils.outputResultSet(QueryExecUtils.java:133)
at org.apache.jena.sparql.util.QueryExecUtils.doSelectQuery(QueryExecUtils.java:150)
at org.apache.jena.sparql.util.QueryExecUtils.executeQuery(QueryExecUtils.java:81)
at se.liu.ida.hefquin.engine.HeFQUINEngineBuilder$MyEngine.executeQuery(HeFQUINEngineBuilder.java:169)
at se.liu.ida.hefquin.cli.RunQueryWithoutSrcSel.exec(RunQueryWithoutSrcSel.java:105)
at org.apache.jena.cmd.CmdMain.mainMethod(CmdMain.java:92)
at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:58)
at org.apache.jena.cmd.CmdMain.mainRun(CmdMain.java:45)
at se.liu.ida.hefquin.cli.RunQueryWithoutSrcSel.main(RunQueryWithoutSrcSel.java:49)
I will work on making HeFQUIN more robust (i.e., such that it does simply die in such cases). However, the error is actually valid. That is, the illegal IRI is indeed returned by the Bioregistry SPARQL endpoint. You can check this by going to https://bioregistry.io/sparql, run the following query, and you will see that several IRIs of this invalid form appear in the result.
PREFIX owl: <http://www.w3.org/2002/07/owl#>
SELECT ?o WHERE {
<http://identifiers.org/ensembl/ENSG00000006125> owl:sameAs ?o
}
I have created an issue about the invalid IRIs in the Bioregistry repo: https://github.com/biopragmatics/bioregistry/issues/803
The reason for the empty result is that the Bioregistry SPARQL endpoint does not support FILTER clauses in queries and HeFQUIN uses the FILTER-based variation as its default implementation of the bind join algorithm. I have filed a corresponding issue in the Bioregistry repo: https://github.com/biopragmatics/bioregistry/issues/804
We also have a VALUES-based implementation and a UNION-based implementation of bind join in HeFQUIN. By using the VALUES-based implementation, the federation query (in the first comment above) works and produces the expected non-empty result. To try this, line 128 in LogicalToPhysicalOpConverter
needs to be changed as follows.
if ( fm instanceof SPARQLEndpoint ) return new PhysicalOpBindJoinWithVALUES(lop);
(i.e., PhysicalOpBindJoinWithFILTER
needs to be replaced by PhysicalOpBindJoinWithVALUES
)
Calling mapping service (i.e. https://bioregistry.io/sparql) returns empty result. Note that this custom SPARQL endpoint currently does not respond to POST requests. issue registered here: https://github.com/biopragmatics/bioregistry/issues/802