databendlabs / databend

𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
https://docs.databend.com
Other
7.85k stars 750 forks source link

bug(query): behavior not like MySQL when set LIMIT 0 #16399

Closed scbizu closed 2 months ago

scbizu commented 2 months ago

Search before asking

Version

v1.2.629-nightly

What's Wrong?

As MySQL documented:

LIMIT 0 quickly returns an empty set. This can be useful for checking the validity of a query. It can also be employed to obtain the types of the result columns within applications that use a MySQL API that makes result set metadata available.

In databend , when set LIMIT 0 (or LIMIT 0,0) , the EXPLAIN (I paste it in Slack) said databend still performs full table data scan instead of only extracting the metadata . Some of client-drivers and SQL frameworks use this feature to check SQL validation and do extract the metadata to the ORM-based structure . With too much data scanning , the performance will step down in this case.

How to Reproduce?

SELECT ... FROM  my_table WHERE 1=1 LIMIT 0,0;

Are you willing to submit PR?

dosubot[bot] commented 2 months ago

Hey @scbizu! I'm here to help you with any bugs, questions, or contributions you have for databend. Let's work together to solve this issue!

I found a similar closed issue: Support Output MySQL format, which was last updated on May 30, 2024. This issue was referenced by another issue: tracing issues about OutputFormat & result compare of tests, also last updated on May 30, 2024. The suggested solution was to refactor DFQueryResultWriter into output MySQL format like Tsv/Csv/Parquet [1].

Regarding the specific behavior of Databend when using LIMIT 0, Databend performs a full table data scan instead of only extracting metadata. This is because its query execution plan includes a TableScan operation that reads rows from the table, even with a LIMIT clause. This behavior is different from MySQL, which can optimize such queries to only extract metadata without scanning the table data [2][3][4].

Additionally, the source code for the Limit physical plan in Databend shows that the Limit plan is built on top of the input plan, which includes the TableScan operation. The Limit plan does not inherently skip the data scan but rather applies the limit after the data has been read. This design choice ensures that the query execution adheres to the specified limits but does not avoid the initial data scan [4].

This behavior might impact performance, especially for client-drivers and SQL frameworks that use this feature to check SQL validation and extract metadata.

To continue talking to Dosu, mention @dosu.

TCeason commented 2 months ago

Maybe you need ignore result?

SELECT number FROM numbers(2) IGNORE_RESULT;

https://docs.databend.com/sql/sql-commands/query-syntax/query-select#ignore_result

BohuTANG commented 2 months ago

This is a optimizer issue? cc @xudong963

BohuTANG commented 2 months ago

Maybe you need ignore result?

SELECT number FROM numbers(2) IGNORE_RESULT;

https://docs.databend.com/sql/sql-commands/query-syntax/query-select#ignore_result

No, this still scan full tables but ignore the result.

sundy-li commented 2 months ago

Join is blocking operator, so limit return early will not be reached.

limit 0 should be optimized it's input to be empty result scan.

Dousir9 commented 2 months ago

Join is blocking operator, so limit return early will not be reached.

limit 0 should be optimized it's input to be empty result scan.

Yes.