ad-freiburg / qlever

Very fast SPARQL Engine, which can handle very large knowledge graphs like the complete Wikidata, offers context-sensitive autocompletion for SPARQL queries, and allows combination with text search. It's faster than engines like Blazegraph or Virtuoso, especially for queries involving large result sets.
Apache License 2.0
417 stars 52 forks source link

Separate the effects of `send` and `LIMIT` again #1488

Closed hannahbast closed 5 days ago

hannahbast commented 2 months ago

Since https://github.com/ad-freiburg/qlever/pull/1355, the query LIMIT is clamped to the value of the QLever-specific send parameter. In particular, this broke the display of the result size in the QLever UI, which sets send=100 when showing the first page of results for a query.

This change restores the old behavior for the send parameter, yet makes good use of the new lazy query processing. Specifically, lazily computed results blocks are now processed as follows: (1) the first result blocks before OFFSET are computed but skipped; (2) then results are computed and materialized until the value of the send parameter is reached; (3) then results are computed and counted but not materialized until the LIMIT is reached; (3) all remaining blocks are not even computed.

The QLever JSON now has two new fields resultSizeTotal and resultSizeExported, with the corresponding values. For the sake of backwards-compatibility, the old resultsize field is still there and has the same value as the resultSizeTotal field. For the same reason, the send parameter keeps its name for now, but should be renamed to exportLimit eventually.

On the side fixed, dropped the hard limit of MAX_NOF_ROWS_IN_RESULT = 1'000'000 for JSON results. Also fix the compilation error introduced by the interplay of the merge of #1603 and #1607 . Fixes #1605 and #1455 .

codecov[bot] commented 2 months ago

Codecov Report

Attention: Patch coverage is 86.31579% with 13 lines in your changes missing coverage. Please review.

Project coverage is 89.21%. Comparing base (bb70c4a) to head (7336a11). Report is 4 commits behind head on master.

Files with missing lines Patch % Lines
src/engine/Server.cpp 0.00% 12 Missing :warning:
src/engine/ExportQueryExecutionTrees.cpp 98.68% 0 Missing and 1 partial :warning:
Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #1488 +/- ## ========================================== + Coverage 89.17% 89.21% +0.04% ========================================== Files 372 372 Lines 34579 34723 +144 Branches 3912 3915 +3 ========================================== + Hits 30835 30978 +143 + Misses 2471 2470 -1 - Partials 1273 1275 +2 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

hannahbast commented 2 months ago

@RobinTF Thank you for your comments + adding @joka921 to the discussion.

  1. I am pretty sure we want an option, where QLever computes the exact result size but only exports a part of the result. This option should not be the default, but it should be possible because it is very useful.

  2. This PR achieves that. It is wasteful in that it materializes the full result internally, but that can (and should) be fixed separately in another PR. I also wouldn't mind fixing it as part of this PR if it's super easy.

  3. Triggering this behavior using the "send" parameter has historical reasons. I agree that "send" is a slight misnomer, but also not too bad.

  4. I don't think that the simpler #1462 achieves what I wrote under item 1 in all cases. Please correct me if that is a misunderstanding. Let's not talk about estimates or computing the exact count with a separate query here. I understand that that would be an option, but that is a separate discussion.

RobinTF commented 2 months ago

@hannahbast

This PR achieves that. It is wasteful in that it materializes the full result internally, but that can (and should) be fixed separately in another PR.

Does it? My impression is that for lazy results it only consumes the generator up to maxSend or limit whichever is lower and then stops. So it won't know how many results there actually are in that case.

hannahbast commented 2 months ago

@RobinTF I understand, but wouldn't that be easy to fix?

And just for my understanding: Can you provide the piece of code needed in computeResultAsQLeverJSON to compute the exact size of result?

RobinTF commented 2 months ago

@RobinTF I understand, but wouldn't that be easy to fix?

And just for my understanding: Can you provide the piece of code needed in computeResultAsQLeverJSON to compute the exact size of result?

You'd have to remove the break statement within ExportQueryExecutionTrees::getRowIndices and read from RuntimeInformation::numRows_ after it's done iterating.

sonarcloud[bot] commented 2 months ago

Quality Gate Passed Quality Gate passed

Issues
3 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

sparql-conformance[bot] commented 5 days ago

Conformance check passed ✅

No test result changes.

Details: https://qlever.cs.uni-freiburg.de/sparql-conformance-ui?cur=7336a11ee3935317a045ad2b9fcc91579ba5eea4&prev=0fadfc18405b7045e0f8c2ba6790ccad2b1572ac

sonarcloud[bot] commented 5 days ago

Quality Gate Passed Quality Gate passed

Issues
3 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud