StarRocks / starrocks

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
https://starrocks.io
Apache License 2.0
8.67k stars 1.75k forks source link

use more window function : Vectorized engine does not support the operator #7015

Closed angelilvy closed 2 years ago

angelilvy commented 2 years ago
image
CREATE TABLE `intelli_analysis_ds_day_6298558304d25a14fa546117` (
  `dim1` varchar(10) NOT NULL DEFAULT "" COMMENT "",
  `dim2` varchar(15) NOT NULL DEFAULT "" COMMENT "",
  `dim3` varchar(15) NOT NULL DEFAULT "" COMMENT "",
  `dim4` varchar(6) NOT NULL DEFAULT "" COMMENT "",
  `dim5` varchar(6) NOT NULL DEFAULT "" COMMENT "",
  `rowdata0` decimal128(26, 5) NULL COMMENT "",
  `rowdata1` decimal128(26, 5) NULL COMMENT "",
  `rowdata2` decimal128(26, 5) NULL COMMENT "",
  `metric1` decimal128(26, 5) NULL COMMENT "",
  `metric2` decimal128(26, 5) NULL COMMENT "",
  `metric3` decimal128(26, 5) NULL COMMENT ""
) ENGINE=OLAP 
UNIQUE KEY(`dim1`, `dim2`, `dim3`, `dim4`, `dim5`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`dim1`, `dim2`, `dim3`, `dim4`, `dim5`) BUCKETS 1 
PROPERTIES (
"replication_num" = "3",
"in_memory" = "false",
"storage_format" = "DEFAULT"
);

select
    sum(cur_rowdata2) over(partition by pre_date order by dim2,dim3,dim4,dim5 rows between unbounded preceding and current row) as tmp_metric1,
    sum(cur_rowdata2) over(partition by pre_date order by dim2,dim3,dim4,dim5 rows between unbounded preceding and current row) as tmp_metric2
from (
    select
        t1.dim1 as cur_date,
        t2.dim1 as pre_date,
        t1.dim2,t1.dim3,t1.dim4,t1.dim5,
        t1.rowdata2 as cur_rowdata2
    from (
        select
            dim1,dim2,dim3,dim4,dim5,
            rowdata0,rowdata1,rowdata2,
            metric1,metric2,metric3
        from
            intelli_analysis_ds_day_6298558304d25a14fa546117
        where
            dim1 = '2022-06-03'
            and dim2 <> '' and dim3 <> '' and dim4 <> '' and dim5 <> '' 

    ) t1
    left join (
        select
            dim1,dim2,dim3,dim4,dim5,
            rowdata0,rowdata1,rowdata2,
            metric1,metric2,metric3
        from
            intelli_analysis_ds_day_6298558304d25a14fa546117
        where
            dim1 < '2022-06-03'
            and dim2 <> '' and dim3 <> '' and dim4 <> '' and dim5 <> '' 

    ) t2
    on
        1 = 1
        and t1.dim2 = t2.dim2 and t1.dim3 = t2.dim3 and t1.dim4 = t2.dim4 and t1.dim5 = t2.dim5 
) t3
stdpain commented 2 years ago

which version?

Allaitian commented 2 years ago

@stdpain 我在 starrocks 2.0.7 也遇到了这个问题

josh7 commented 2 years ago

@stdpain

遇到同样的问题,star rock 2.0.1 示例 SELECT event_day ,sum(ts) over(order by action rows between 7 preceding and 1 preceding) as rows1 ,sum(ts) over(order by action rows between 7 preceding and 1 preceding) as rows2 from ods_es.event_item where event_day = '2022-06-21' ': (1064, 'Vectorized engine does not support the operator')

kangkaisen commented 2 years ago

@Allaitian @josh7 could give the explain costs + sql result, thanks.

stdpain commented 2 years ago

I have reproduced it. caused by a wrong plan.

+-----------------------------------------------------------------------------+
| Explain String                                                              |
+-----------------------------------------------------------------------------+
| PLAN FRAGMENT 0                                                             |
|  OUTPUT EXPRS:1: id_int | 13: sum(1: id_int) | 14: expr                     |
|   PARTITION: UNPARTITIONED                                                  |
|                                                                             |
|   RESULT SINK                                                               |
|                                                                             |
|   4:Project                                                                 |
|   |  <slot 1> : 1: id_int                                                   |
|   |  <slot 13> : 13: sum(1: id_int)                                         |
|   |  <slot 14> : sum(1: id_int)                                             |
|   |                                                                         |
|   3:ANALYTIC                                                                |
|   |  functions: [, sum(1: id_int), ]                                        |
|   |  order by: 6: id_float ASC                                              |
|   |  window: ROWS BETWEEN 7 PRECEDING AND 1 PRECEDING                       |
|   |                                                                         |
|   2:MERGING-EXCHANGE                                                        |
|                                                                             |
| PLAN FRAGMENT 1                                                             |
|  OUTPUT EXPRS:                                                              |
|   PARTITION: RANDOM                                                         |
|                                                                             |
|   STREAM DATA SINK                                                          |
|     EXCHANGE ID: 02                                                         |
|     UNPARTITIONED                                                           |
|                                                                             |
|   1:SORT                                                                    |
|   |  order by: <slot 6> 6: id_float ASC                                     |
|   |  offset: 0                                                              |
|   |                                                                         |
|   0:OlapScanNode                                                            |
|      TABLE: test_basic                                                      |
|      PREAGGREGATION: ON                                                     |
|      partitions=1/1                                                         |
|      rollup: test_basic                                                     |
|      tabletRatio=10/10                                                      |
|      tabletList=10007,10009,10011,10013,10015,10017,10019,10021,10023,10025 |
|      cardinality=1                                                          |
|      avgRowSize=2.0                                                         |
|      numNodes=0                                                             |
+-----------------------------------------------------------------------------+

known failed:

liuyehcf commented 2 years ago

The root case is

CREATE TABLE `t0` (
  `d1` int NULL COMMENT "",
  `d2` int NULL COMMENT ""
) ENGINE=OLAP 
UNIQUE KEY(`d1`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`d1`) BUCKETS 1 
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"storage_format" = "DEFAULT"
);

select 
    sum(d1) over (order by d2 rows between unbounded preceding and current row) tmp_d1_1, 
    sum(d1) over (order by d2 rows between unbounded preceding and current row) tmp_d1_2
from t0;