StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
https://starrocks.io
Apache License 2.0
8.91k stars 1.79k forks source link

[Bug] window function result error #7493

Closed liuyehcf closed 2 years ago

liuyehcf commented 2 years ago

Steps to reproduce the behavior (Required)

CREATE DATABASE IF NOT EXISTS `analytic`;
DROP TABLE IF EXISTS `analytic`.`t0`;
CREATE TABLE IF NOT EXISTS `analytic`.`t0` (
  `v1` int(11) NOT NULL,
  `v2` int(11) NOT NULL,
  `v3` int(11) NOT NULL
) ENGINE=OLAP
DUPLICATE KEY(`v1`)
DISTRIBUTED BY HASH(`v1`) BUCKETS 10
PROPERTIES (
 "replication_num" = "1"
);

INSERT INTO `analytic`.`t0` (v1, v2, v3) values
    (1, 1, 1),
    (1, 1, 2),
    (1, 1, 3),
    (1, 2, 4),
    (1, 2, 5),
    (1, 2, 6),
    (2, 3, 7),
    (2, 3, 8),
    (2, 3, 9),
    (2, 4, 10),
    (2, 4, 11),
    (2, 4, 12);
INSERT INTO `analytic`.`t0` SELECT * FROM `analytic`.`t0`;
INSERT INTO `analytic`.`t0` SELECT * FROM `analytic`.`t0`;
INSERT INTO `analytic`.`t0` SELECT * FROM `analytic`.`t0`;
INSERT INTO `analytic`.`t0` SELECT * FROM `analytic`.`t0`;
INSERT INTO `analytic`.`t0` SELECT * FROM `analytic`.`t0`;
INSERT INTO `analytic`.`t0` SELECT * FROM `analytic`.`t0`;
INSERT INTO `analytic`.`t0` SELECT * FROM `analytic`.`t0`;
INSERT INTO `analytic`.`t0` SELECT * FROM `analytic`.`t0`;
INSERT INTO `analytic`.`t0` SELECT * FROM `analytic`.`t0`;
INSERT INTO `analytic`.`t0` SELECT * FROM `analytic`.`t0`;
INSERT INTO `analytic`.`t0` SELECT * FROM `analytic`.`t0`;
INSERT INTO `analytic`.`t0` SELECT * FROM `analytic`.`t0`;
SELECT avg(v3) OVER (PARTITION BY v2 ORDER BY v1,v3 ROWS BETWEEN UNBOUNDED PRECEDING AND 1 FOLLOWING ) AS wv, * FROM analytic.t0 ORDER v2,v1,v3

Expected behavior (Required)

The result of non-pipeline engine and pipeline engine are exactly the same

Real behavior (Required)

result of pipeline engine

+----------+--------------------+-------------------+---------+---------+
| count(*) | sum(wv)            | avg(wv)           | min(wv) | max(wv) |
+----------+--------------------+-------------------+---------+---------+
| 49152    | 288208.05550502916 | 5.863607900086042 | 1.0     | 11.0    |
+----------+--------------------+-------------------+---------+---------+

result of non-pipeline engine

+----------+-------------------+-------------------+---------+---------+
| count(*) | sum(wv)           | avg(wv)           | min(wv) | max(wv) |
+----------+-------------------+-------------------+---------+---------+
| 49152    | 288208.0555050295 | 5.863607900086049 | 1.0     | 11.0    |
+----------+-------------------+-------------------+---------+---------+

StarRocks version (Required)

+-------------------+
| current_version() |
+-------------------+
| UNKNOWN 5e9664bbc |
+-------------------+
liuyehcf commented 2 years ago

The difference is only in the precision bits, and if we compare the results of these two engines using the sub query

SELECT avg(v3) OVER (PARTITION BY v2 ORDER BY v1,v3 ROWS BETWEEN UNBOUNDED PRECEDING AND 1 FOLLOWING ) AS wv, * FROM analytic.t0 ORDER BY v2,v1,v3,wv;

we found that the result is exactly the same.

So maybe it is acceptable to loss precision in some degree.