Currently, before the actual input SPARQL query is executed, counting is done by making a modified version of the input SPARQL query, changing it into a query of type SELECT COUNT (*) ...
This only works for a subset of all possible input SPARQL queries.
For example, this input SPARQL query, when executed over two sources, containing triples with same ?s ?o but different ?p, will result in excessive counts:
SELECT DISTINCT ?s ?o WHERE {
?s ?p ?o
}
while this one will count correctly:
SELECT DISTINCT ?s ?p ?o WHERE {
?s ?p ?o
}
This is easy to test in the test query "A test on DISTINCT LIMIT OFFSET" by modifying the input SPARQL query into the above alternatives.
According to @rubensworks, it is not less efficient to stream all results using the input SPARQL query and count its results while consuming the stream.
Currently, before the actual input SPARQL query is executed, counting is done by making a modified version of the input SPARQL query, changing it into a query of type SELECT COUNT (*) ... This only works for a subset of all possible input SPARQL queries. For example, this input SPARQL query, when executed over two sources, containing triples with same ?s ?o but different ?p, will result in excessive counts:
while this one will count correctly:
This is easy to test in the test query "A test on DISTINCT LIMIT OFFSET" by modifying the input SPARQL query into the above alternatives.
According to @rubensworks, it is not less efficient to stream all results using the input SPARQL query and count its results while consuming the stream.