datafuselabs / databend

𝗗𝗮𝘁𝗮, 𝗔𝗻𝗮𝗹𝘆𝘁𝗶𝗰𝘀 & 𝗔𝗜. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
https://docs.databend.com
Other
7.69k stars 727 forks source link

bug: 'explain create table t as select ....' did create the table #13604

Open JackTan25 opened 10 months ago

JackTan25 commented 10 months ago

Search before asking

Version

936ffc9

What's Wrong?

the pipeline is not distributed, and explain sql will do "create table t4".

How to Reproduce?

mysql> explain pipeline  create table t4 as select * from target_table;
+--------------------------------------------------------------------------------------------+
| explain                                                                                    |
+--------------------------------------------------------------------------------------------+
| CommitSink × 1 processor                                                                   |
|   MutationAggregator × 1 processor                                                         |
|     TransformSerializeSegment × 1 processor                                                |
|       Merge (TransformSerializeBlock × 10 processors) to (TransformSerializeSegment × 1)   |
|         TransformSerializeBlock × 10 processors                                            |
|           BlockCompactTransform × 10 processors                                            |
|             CompoundBlockOperator(Project) × 10 processors                                 |
|               DeserializeDataTransform × 10 processors                                     |
|                 SyncReadParquetDataSource × 10 processors                                  |
+--------------------------------------------------------------------------------------------+
9 rows in set (0.10 sec)
Read 0 rows, 0.00 B in 0.047 sec., 0 rows/sec., 0.00 B/sec.

mysql> show tables;
+-------------------+
| Tables_in_default |
+-------------------+
| source_table      |
| t3                |
| t4                |
| target_table      |
| tmp               |
+-------------------+
5 rows in set (0.05 sec)
Read 5 rows, 1.00 KiB in 0.035 sec., 142.36 rows/sec., 28.56 KiB/sec.

Are you willing to submit PR?

dosubot[bot] commented 2 weeks ago

Hey there, @JackTan25! I'm here to help you with any bugs, questions, or contributions you have for the repository. Let me know how I can assist you!

The issue you're encountering with the EXPLAIN CREATE TABLE AS SELECT statement creating the table is due to specific conditions in the codebase. When the EXPLAIN statement is used with ExplainKind::Plan and the plan is of type Plan::CreateTable, the code checks if the as_select field is present. If it is, the code proceeds to explain the query within the CREATE TABLE AS SELECT statement, which can inadvertently lead to the execution of the CREATE TABLE operation.

Here is the relevant code snippet:

Plan::CreateTable(plan) => match &plan.as_select {
    Some(box Plan::Query {
        s_expr,
        metadata,
        bind_context,
        formatted_ast,
        ..
    }) => {
        let mut res =
            vec![DataBlock::new_from_columns(vec![StringType::from_data(
                vec!["CreateTableAsSelect:", ""],
            )])];
        res.extend(
            self.explain_query(s_expr, metadata, bind_context, formatted_ast)
                .await?,
        );
        vec![DataBlock::concat(&res)?]
    }
    _ => self.explain_plan(&self.plan)?,
},

This code indicates that if the as_select field is present in the CreateTable plan, the explain_query method is called, which could lead to the execution of the CREATE TABLE operation [1].

Additionally, the test cases from tests/sqllogictests/suites/mode/standalone/explain/explain_ddl.test provide further evidence that the EXPLAIN statement for CREATE TABLE AS SELECT includes detailed query plan information, suggesting that the query is being processed in a way that could lead to the execution of the CREATE TABLE operation [2].

To address this issue, you may need to modify the handling of the EXPLAIN statement to ensure that it does not execute the CREATE TABLE operation. This could involve adding checks to prevent the execution or restructuring the code to separate the explanation logic from the execution logic.

To continue talking to Dosu, mention @dosu.