StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
https://starrocks.io
Apache License 2.0
8.88k stars 1.78k forks source link

Using RFC3339 values in WHERE clause is very slow #48922

Closed nferrario closed 1 month ago

nferrario commented 3 months ago

Hi team, I noticed that StarRocks gets VERY slow when I use RFC3339 datetime values (e.g. 2024-07-03T00:00:00Z) to filter partitions.

The examples below are not affected by cache or cold-starts.

SELECT COUNT(1)
  FROM legacy.performance
  WHERE mm_date >= '2024-07-01T00:00:00Z'
  AND mm_date < '2024-07-03T00:00:00Z'
[2024-07-25 11:01:39] 1 row retrieved starting from 1 in 35 s 93 ms (execution: 35 s 48 ms, fetching: 45 ms)

> SELECT COUNT(1)
  FROM legacy.performance
  WHERE mm_date >= '2024-07-01T00:00:00'
  AND mm_date < '2024-07-03T00:00:00'
[2024-07-25 11:01:42] 1 row retrieved starting from 1 in 760 ms (execution: 734 ms, fetching: 26 ms)

> SELECT COUNT(1)
  FROM legacy.performance
  WHERE mm_date >= '2024-07-01 00:00:00'
  AND mm_date < '2024-07-03 00:00:00'
[2024-07-25 11:01:46] 1 row retrieved starting from 1 in 864 ms (execution: 827 ms, fetching: 37 ms)

Query plan - slow query:

EXPLAIN SELECT COUNT(1)
FROM legacy.performance
WHERE mm_date >= '2024-07-01T00:00:00Z'
AND mm_date < '2024-07-03T00:00:00Z';

+--------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                 |
+--------------------------------------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0                                                                                                                |
|  OUTPUT EXPRS:135: count                                                                                                       |
|   PARTITION: UNPARTITIONED                                                                                                     |
|                                                                                                                                |
|   RESULT SINK                                                                                                                  |
|                                                                                                                                |
|   4:AGGREGATE (merge finalize)                                                                                                 |
|   |  output: count(135: count)                                                                                                 |
|   |  group by:                                                                                                                 |
|   |                                                                                                                            |
|   3:EXCHANGE                                                                                                                   |
|                                                                                                                                |
| PLAN FRAGMENT 1                                                                                                                |
|  OUTPUT EXPRS:                                                                                                                 |
|   PARTITION: RANDOM                                                                                                            |
|                                                                                                                                |
|   STREAM DATA SINK                                                                                                             |
|     EXCHANGE ID: 03                                                                                                            |
|     UNPARTITIONED                                                                                                              |
|                                                                                                                                |
|   2:AGGREGATE (update serialize)                                                                                               |
|   |  output: count(1)                                                                                                          |
|   |  group by:                                                                                                                 |
|   |                                                                                                                            |
|   1:Project                                                                                                                    |
|   |  <slot 137> : 1                                                                                                            |
|   |                                                                                                                            |
|   0:OlapScanNode                                                                                                               |
|      TABLE: performance                                                                                                        |
|      PREAGGREGATION: ON                                                                                                        |
|      PREDICATES: 1: mm_date >= CAST('2024-07-01T00:00:00Z' AS DATETIME), 1: mm_date < CAST('2024-07-03T00:00:00Z' AS DATETIME) |
|      partitions=4519/4520                                                                                                      |
|      rollup: performance                                                                                                       |
|      tabletRatio=93667/93667                                                                                                   |
|      tabletList=73883,73884,73885,73886,73887,73888,73889,73890,73891,73892 ...                                                |
|      cardinality=10342361405                                                                                                   |
|      avgRowSize=9.0                                                                                                            |
+--------------------------------------------------------------------------------------------------------------------------------+

Query plan - fast queries:

EXPLAIN SELECT COUNT(1)
FROM legacy.performance
WHERE mm_date >= '2024-07-01 00:00:00'
AND mm_date < '2024-07-03 00:00:00';

+-----------------------------------+
| Explain String                    |
+-----------------------------------+
| PLAN FRAGMENT 0                   |
|  OUTPUT EXPRS:135: count          |
|   PARTITION: UNPARTITIONED        |
|                                   |
|   RESULT SINK                     |
|                                   |
|   4:AGGREGATE (merge finalize)    |
|   |  output: count(135: count)    |
|   |  group by:                    |
|   |                               |
|   3:EXCHANGE                      |
|                                   |
| PLAN FRAGMENT 1                   |
|  OUTPUT EXPRS:                    |
|   PARTITION: RANDOM               |
|                                   |
|   STREAM DATA SINK                |
|     EXCHANGE ID: 03               |
|     UNPARTITIONED                 |
|                                   |
|   2:AGGREGATE (update serialize)  |
|   |  output: count(1)             |
|   |  group by:                    |
|   |                               |
|   1:Project                       |
|   |  <slot 137> : 1               |
|   |                               |
|   0:OlapScanNode                  |
|      TABLE: performance           |
|      PREAGGREGATION: ON           |
|      partitions=2/4520            |
|      rollup: performance          |
|      tabletRatio=2/2              |
|      tabletList=14607722,14870514 |
|      cardinality=171250           |
|      avgRowSize=9.0               |
+-----------------------------------+

StarRocks version: 3.3.1-2b87854 - Shared Data

chenminghua8 commented 3 months ago

can someone assign this task to me?