apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.3k stars 3.21k forks source link

Column can not be resloved after reorderTable #3908

Open spaces-X opened 4 years ago

spaces-X commented 4 years ago
  1. Create three same tables :
CREATE TABLE `s1` (
  `dt` int(11) NULL COMMENT "",
  `id` int(11) NULL COMMENT "",
  `pv` int(11) SUM NULL COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(`dt`, `id`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`id`) BUCKETS 1
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"storage_format" = "DEFAULT"
);
CREATE TABLE `s2` (
  `dt` int(11) NULL COMMENT "",
  `id` int(11) NULL COMMENT "",
  `pv` int(11) SUM NULL COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(`dt`, `id`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`id`) BUCKETS 1
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"storage_format" = "DEFAULT"
);
CREATE TABLE `s3` (
  `dt` int(11) NULL COMMENT "",
  `id` int(11) NULL COMMENT "",
  `pv` int(11) SUM NULL COMMENT ""
) ENGINE=OLAP
AGGREGATE KEY(`dt`, `id`)
COMMENT "OLAP"
DISTRIBUTED BY HASH(`id`) BUCKETS 1
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"storage_format" = "DEFAULT"
);
  1. load some data: Make sure the row count of s3 is larger than s2.
    
    insert into s1 (dt, id, pv) values (20200609, 1, 10), (20200609, 2, 20);

insert into s2 (dt, id, pv) values (20200609, 1, 10), (20200609, 2, 20); insert into s2 (dt, id, pv) values (20200608, 1, 10), (20200608, 2, 20);

insert into s3 (dt, id, pv) values (20200609, 1, 10), (20200609, 2, 20); insert into s3 (dt, id, pv) values (20200608, 1, 10), (20200608, 2, 20); insert into s3 (dt, id, pv) values (20200607, 1, 10), (20200607, 2, 20);


3. Query sql:  make sure that the process of reanalyzing will work.

SELECT tt1.dt FROM s1 tt1 INNER JOIN s2 tt2 on tt2.id=1 INNER JOIN s3 tt3 ON tt3.id = tt2.id AND tt3.pv > 1+1



ERROR 1054 (42S22): errCode = 2, detailMessage = Unknown column 'id' in 'tt2'

This error is caused by reorder table.
We expect the order of these tables is that tt3 should be after tt2 because tt3  depends on tt2  with onclause `ON tt3.id = tt2.id`. But actually tt3 is ahead of tt2, which will cause `id` in `tt2` has not been resloved before using.

Expected:      `tt1->tt2->tt3`    At least tt2 should be ahead of tt3.
reordered:     `tt3->tt2->tt1`
After FromClause#sortTableRefForSubquery:  `tt1->tt3->tt2`

In SelectStmt#reorderTable, the order of tables depends on their rowcount. 
In this case, the rowcount of tt3 is larger than tt2.
After expr rewriting , we first reanalyze  tt3 and tt3 depends on tt2.id which can not be used now.
Any suggestions for improving  reorderTable?
spaces-X commented 4 years ago

@liutang123