ClickHouse / ClickHouse

ClickHouse® is a real-time analytics DBMS
https://clickhouse.com
Apache License 2.0
37.75k stars 6.93k forks source link

A query is too slow when using enable_analyzer=1 (by default) #66878

Open Tiaonmmn opened 4 months ago

Tiaonmmn commented 4 months ago

Describe the situation The same query works differently with settings allow_experimental_analyzer=1 and 0.

How to reproduce

The DDL of Distributed table:

create table dis_TABLEA
( 
    CTIME DateTime,
    AAA   String,
    BBB   Nullable(String),
    CCC   Nullable(String),
    DDD   Nullable(String),
    EEE   Nullable(String),
    FFF   Nullable(String),
    GGG   Nullable(String),
    HHH   Nullable(String),
---Nearly 100 columns are Nullable String
    DT    UInt32
)
    engine = Distributed('CLUSTERA', 'DATABASE', 'TABLEA', toUInt64OrZero('AAA'));

The DDL of the underlying base table:

create table TABLEA
(
    CTIME DateTime,
    AAA   String,
    BBB   Nullable(String),
    CCC   Nullable(String),
    DDD   Nullable(String),
    EEE   Nullable(String),
    FFF   Nullable(String),
    GGG   Nullable(String),
    HHH   Nullable(String),
---Nearly 100 columns are Nullable String
    DT    UInt32
)
    engine = MergeTree PARTITION BY toUInt32(DT / 7)
        ORDER BY AAA
        SETTINGS index_granularity = 8192;
select count(1)
from (select *
      from (select toDateTime(CTIME) ,
                   BBB,
                   AAA,
                   CCC,
                   DDD,
                   EEE,
                   FFF,
                   GGG,
                   HHH,
                   DT
            from DATABASE.dis_TABLEA) t
      WHERE "AAA" = '123456789') t1

Expected performance When allow_experimental_analyzer = 0, the query can finish in less than 3 seconds.When allow_experimental_analyzer = 1, the query will last for at least 5 minute. with the network load monitor, I can see that allow_experimental_analyzer = 1 causes too much network traffic, seems like that it's reading the whole table content.

Additional context The explain with allow_experimental_analyzer = 0:

explain
select count(1)
from (select *
      from (select toDateTime(CTIME) ,
                   BBB,
                   AAA,
                   CCC,
                   DDD,
                   EEE,
                   FFF,
                   GGG,
                   HHH,
                   DT
            from DATABASE.dis_TABLEA) t
      WHERE "AAA" = '123456789') t1 SETTINGS allow_experimental_analyzer = 0;

    ┌─explain───────────────────────────────────────────────────────────────┐
 1. │ Expression ((Projection + Before ORDER BY))                           │
 2. │   Aggregating                                                         │
 3. │     Union                                                             │
 4. │       Expression ((Before GROUP BY + (Projection + Before ORDER BY))) │
 5. │         Filter ((WHERE + (Projection + Before ORDER BY)))             │
 6. │           Filter (WHERE)                                              │
 7. │             ReadFromPreparedSource (Read from NullSource)             │
 8. │       Expression (( + ( + )))                                         │
 9. │         Filter                                                        │
10. │           ReadFromRemote (Read from remote replica)                   │
    └───────────────────────────────────────────────────────────────────────┘

The explain with allow_experimental_analyzer = 1:

explain
select count(1)
from (select *
      from (select toDateTime(CTIME) ,
                   BBB,
                   AAA,
                   CCC,
                   DDD,
                   EEE,
                   FFF,
                   GGG,
                   HHH,
                   dt
            from DATABASE.dis_TABLEA) t
      WHERE "AAA" = '123456789') t1 SETTINGS allow_experimental_analyzer = 1;

   ┌─explain──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
1. │ Expression ((Project names + Projection))                                                                                                                                                                                                                │
2. │   Aggregating                                                                                                                                                                                                                                            │
3. │     Union                                                                                                                                                                                                                                                │
4. │       Expression ((Before GROUP BY + (Change column names to column identifiers + (Project names + (Projection + (WHERE + (Change column names to column identifiers + (Project names + (Projection + Change column names to column identifiers))))))))) │
5. │         Filter (((WHERE + (Change column names to column identifiers + (Project names + (Projection + Change column names to column identifiers)))))[split])                                                                                             │
6. │           ReadFromPreparedSource (Read from NullSource)                                                                                                                                                                                                  │
7. │       Expression (( + ( + ( + ))))                                                                                                                                                                                                                       │
8. │         Filter (( + ))                                                                                                                                                                                                                                   │
9. │           ReadFromRemote (Read from remote replica)                                                                                                                                                                                                      │
   └──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
novikd commented 4 months ago

Could you please share the EXPLAIN PLAN INDEXES = 1 output? Also, it'd be useful if you share logs for these queries.

Tiaonmmn commented 4 months ago

Could you please share the EXPLAIN PLAN INDEXES = 1 output? Also, it'd be useful if you share logs for these queries.

What log? Do you mean outputs in clickhouse-server.log? But there are so many rows per second, how can I filter the query log?

novikd commented 3 months ago

@Tiaonmmn The whole server log is not required. Only logs for a specific query are enough. Each query has a unique query_id. It's printed out in clickhouse-client when you run the query. Afterward, you can filter server logs using query_id.

helifu commented 3 months ago

Please set this parameter in clickhouse-client before running your query(SQL): :) set send_logs_level='trace'; Then the following output is what @novikd wants. :)