apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.78k stars 3.29k forks source link

[fix](planner) query should be cancelled if limit reached #44338

Open morningman opened 4 days ago

morningman commented 4 days ago

What problem does this PR solve?

Problem Summary: When there is a limit cluse in SQL, if FE has obtained data with more than the limit number of rows, it should send a cancel command to BE to cancel the query to prevent BE from reading more data. However, this function has problems in the current code and does not work. Especially in external table query, this may result in lots of unnecessary network io read.

  1. isBlockQuery

    In the old optimizer, if a query statement contains a sort or agg node, isBlockQuery will be marked as true, otherwise it will be false. In the new optimizer, this value is always true.

    Regardless of the old or new optimizer, this logic is wrong. But only when isBlockQuery = false will the reach limit logic be triggered.

  2. Calling problem of reach limit logic

    The reach limit logic judgment will only be performed when eos = true in the rowBatch returned by BE. This is wrong. Because for limit N queries, each BE's own limit is N. But for FE, as long as the total number of rows returned by all BEs exceeds N, the reach limit logic can be triggered. So it should not be processed only when eos = true.

The PR mainly changes:

  1. Remove isBlockQuery

    isBlockQuery is only used in the reach limit logic. And it is not needed. Remove it completely.

  2. Modify the judgment position of reach limit.

    When the number of rows obtained by FE is greater than the limit, it will check the reach limit logic.

  3. fix wrong limitRows in QueryProcessor

    the limitRows should be got from the first fragment, not last.

Release note

fix query should be cancelled if limit reached

Check List (For Author)

Check List (For Reviewer who merge this PR)

doris-robot commented 4 days ago

Thank you for your contribution to Apache Doris. Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?
morningman commented 4 days ago

run buildall

github-actions[bot] commented 4 days ago

PR approved by at least one committer and no changes requested.

github-actions[bot] commented 4 days ago

PR approved by anyone and no changes requested.

morningman commented 4 days ago

run buildall

morningman commented 4 days ago

run buildall

morningman commented 3 days ago

run buildall

github-actions[bot] commented 2 days ago

PR approved by at least one committer and no changes requested.