ad-freiburg / qlever

Very fast SPARQL Engine, which can handle very large knowledge graphs like the complete Wikidata, offers context-sensitive autocompletion for SPARQL queries, and allows combination with text search. It's faster than engines like Blazegraph or Virtuoso, especially for queries involving large result sets.
Apache License 2.0
429 stars 52 forks source link

`SERVICE` query ignores `LIMIT` #1655

Open hannahbast opened 2 days ago

hannahbast commented 2 days ago

The following query should return a single row, but it takes forever because the SERVICE query is executed without the LIMIT 1.

@UNEXENU and @joka921 Do you have an idea why that is the case?

SELECT * WHERE {
  SERVICE <https://qlever.cs.uni-freiburg.de/api/wikidata> {
    SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 1
  }
}

https://qlever.cs.uni-freiburg.de/wikidata/RJCTro

UNEXENU commented 1 day ago

I've replicated the error with the following query on the olympics dataset:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX olympics: <http://wallscope.co.uk/ontology/olympics/>
PREFIX medal: <http://wallscope.co.uk/resource/olympics/medal/>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT * WHERE {
  SERVICE <https://qlever.cs.uni-freiburg.de/olympics> {
    SELECT ?athlete WHERE {
      ?athlete dbo:team <http://wallscope.co.uk/resource/olympics/team/Germany> .
    }
    LIMIT 1
  }
}

Whats happening is, the Service puts a SELECT-clause with the visible variables around the query it gets passed - here SELECT ?athlete WHERE { SELECT ?athlete WHERE { ... } LIMIT 1}.

So the Service-endpoint in the query above basically computes the same incorrect result as the following nested query:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX olympics: <http://wallscope.co.uk/ontology/olympics/>
PREFIX medal: <http://wallscope.co.uk/resource/olympics/medal/>
PREFIX dbo: <http://dbpedia.org/ontology/>
SELECT ?athlete WHERE {
  SELECT ?athlete WHERE {
    ?athlete dbo:team <http://wallscope.co.uk/resource/olympics/team/Germany> .
  }
  LIMIT 1
}

It somehow works (returns one row) when ORDER BY ?athlete is added before LIMIT 1, however i don't know why that is the case.

hannahbast commented 3 hours ago

@UNEXENU Thanks a lot for the reply. The issue has indeed nothing to do with SERVICE. It happens exactly when there is a subquery with a single index scan and a LIMIT. A minimal dataset-independent example is:

SELECT * WHERE {
  SELECT * WHERE { ?s ?p ?o } LIMIT 1
}

https://qlever.cs.uni-freiburg.de/wikidata/IUeShp