apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.32k stars 3.21k forks source link

[Bug] Error in join operation using subquery containing row_number #9545

Closed emerkfu closed 2 years ago

emerkfu commented 2 years ago

Search before asking

Version

0.15.3

What's Wrong?

-- The following is the DDL to create the test table.

DROP TABLE test.rownumber_test_00;

CREATE TABLE test.rownumber_test_00 ( code varchar(200) NULL, name char(255) NULL ) ENGINE=OLAP UNIQUE KEY(code) COMMENT "rownumber_test_00" DISTRIBUTED BY HASH(code) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "in_memory" = "false", "storage_format" = "V2" );

DROP TABLE test.rownumber_test_01;

CREATE TABLE test.rownumber_test_01 ( id varchar(128) NOT NULL, outer_id varchar(128) NOT NULL ) ENGINE=OLAP UNIQUE KEY(id, outer_id) COMMENT "rownumber_test_01" DISTRIBUTED BY HASH(id, outer_id) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "in_memory" = "false", "storage_format" = "V2" );

DROP TABLE test.rownumber_test_02;

CREATE TABLE test.rownumber_test_02 ( id varchar(128) NOT NULL, refresh_token_expiration datetime NOT NULL , client_actor_id varchar(128) NULL, service_actor_id varchar(128) NULL, client_application varchar(128) NOT NULL, service_application varchar(128) NOT NULL, is_deleted_flg int(11) NULL DEFAULT "0" ) ENGINE=OLAP UNIQUE KEY(id) COMMENT "rownumber_test_02" DISTRIBUTED BY HASH(id) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "in_memory" = "false", "storage_format" = "V2" );

DROP TABLE test.rownumber_test_03;

CREATE TABLE test.rownumber_test_03 ( id varchar(1020) NOT NULL, token_id varchar(1020) NULL, name varchar(1020) NULL, is_deleted_flg int(11) NULL DEFAULT "0" ) ENGINE=OLAP UNIQUE KEY(id) COMMENT "rownumber_test_03" DISTRIBUTED BY HASH(id) BUCKETS 1 PROPERTIES ( "replication_allocation" = "tag.location.default: 1", "in_memory" = "false", "storage_format" = "V2" );


-- The following is the SQL that will report an error when executing insert select. -- If only the select part is executed, no error will be reported. -- However, when the complete insert select SQL is executed, an error will occur. -- The above scenario can be reproduced even if there is no data in the test table.

insert into test.rownumber_test_00 select t0.id, t1.name from test.rownumber_test_01 t0 left join (select t.client_actor_id, t.name from (select t1.client_actor_id, t2.name, row_number()OVER(PARTITION BY t1.client_actor_id,t1.service_actor_id,t1.service_application,t2.name ORDER BY t1.refresh_token_expiration desc) AS rn from test.rownumber_test_02 t1 left join test.rownumber_test_03 t2 on t1.id = t2.token_id and t2.is_deleted_flg=0 where t1.is_deleted_flg=0) t where t.rn = 1 ) t1 on t0.id= t1.client_actor_id ;


The following content is the error content.

org.jkiss.dbeaver.model.sql.DBSQLException: SQL 错误 [1064] [42000]: errCode = 2, detailMessage = can't support at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCStatementImpl.executeStatement(JDBCStatementImpl.java:133) at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.executeStatement(SQLQueryJob.java:509) at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.lambda$0(SQLQueryJob.java:440) at org.jkiss.dbeaver.model.exec.DBExecUtils.tryExecuteRecover(DBExecUtils.java:168) at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.executeSingleQuery(SQLQueryJob.java:427) at org.jkiss.dbeaver.ui.editors.sql.execute.SQLQueryJob.extractData(SQLQueryJob.java:812) at org.jkiss.dbeaver.ui.editors.sql.SQLEditor$QueryResultsContainer.readData(SQLEditor.java:3181) at org.jkiss.dbeaver.ui.controls.resultset.ResultSetJobDataRead.lambda$0(ResultSetJobDataRead.java:121) at org.jkiss.dbeaver.model.exec.DBExecUtils.tryExecuteRecover(DBExecUtils.java:168) at org.jkiss.dbeaver.ui.controls.resultset.ResultSetJobDataRead.run(ResultSetJobDataRead.java:119) at org.jkiss.dbeaver.ui.controls.resultset.ResultSetViewer$ResultSetDataPumpJob.run(ResultSetViewer.java:4514) at org.jkiss.dbeaver.model.runtime.AbstractJob.run(AbstractJob.java:105) at org.eclipse.core.internal.jobs.Worker.run(Worker.java:63) Caused by: java.sql.SQLSyntaxErrorException: errCode = 2, detailMessage = can't support at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:120) at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:122) at com.mysql.cj.jdbc.StatementImpl.executeInternal(StatementImpl.java:768) at com.mysql.cj.jdbc.StatementImpl.execute(StatementImpl.java:653) at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCStatementImpl.execute(JDBCStatementImpl.java:327) at org.jkiss.dbeaver.model.impl.jdbc.exec.JDBCStatementImpl.executeStatement(JDBCStatementImpl.java:130) ... 12 more

The following is Explain String .

Explain String
PLAN FRAGMENT 0
OUTPUT EXPRS:t0.id t2.name 0
PARTITION: HASH_PARTITIONED: default_cluster:test.rownumber_test_01.id, default_cluster:test.rownumber_test_01.outer_id
OLAP TABLE SINK
TUPLE ID: 0
RANDOM
7:HASH JOIN
join op: LEFT OUTER JOIN (BROADCAST)
hash predicates:
colocate: false, reason: The src data has been redistributed
equal join conjunct: t0.id = <slot 25> <slot 6>
cardinality=0
tuple ids: 1 9N 8N
----10:EXCHANGE
tuple ids: 9 8
0:OlapScanNode
TABLE: rownumber_test_01
PREAGGREGATION: OFF. Reason: No AggregateInfo
PREDICATES: t0.__DORIS_DELETE_SIGN__ = 0
partitions=0/1
rollup: null
tabletRatio=0/0
tabletList=
cardinality=0
avgRowSize=17.0
numNodes=1
tuple ids: 1
PLAN FRAGMENT 1
OUTPUT EXPRS:
PARTITION: HASH_PARTITIONED: t1.client_actor_id, t1.service_actor_id, t1.service_application, t2.name
STREAM DATA SINK
EXCHANGE ID: 10
UNPARTITIONED
6:SELECT
predicates: <slot 23> = 1, <slot 23> = 1
tuple ids: 9 8
5:ANALYTIC
functions: [, row_number(), ]
partition by: t1.client_actor_id, t1.service_actor_id, t1.service_application, t2.name
order by: <slot 28> <slot 10> DESC NULLS LAST
window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
tuple ids: 9 8
4:SORT
order by: <slot 25> <slot 6> ASC, <slot 26> <slot 8> ASC, <slot 27> <slot 9> ASC, <slot 33> <slot 7> ASC, <slot 28> <slot 10> DESC
offset: 0
tuple ids: 9
9:EXCHANGE
tuple ids: 2 3N
PLAN FRAGMENT 2
OUTPUT EXPRS:
PARTITION: HASH_PARTITIONED: default_cluster:test.rownumber_test_02.id
STREAM DATA SINK
EXCHANGE ID: 09
HASH_PARTITIONED: t1.client_actor_id, t1.service_actor_id, t1.service_application, t2.name
3:HASH JOIN
join op: LEFT OUTER JOIN (BROADCAST)
hash predicates:
colocate: false, reason: Tables are not in the same group
equal join conjunct: t1.id = t2.token_id
cardinality=0
tuple ids: 2 3N
----8:EXCHANGE
tuple ids: 3
1:OlapScanNode
TABLE: rownumber_test_02
PREAGGREGATION: OFF. Reason: No AggregateInfo
PREDICATES: t1.is_deleted_flg = 0, t1.__DORIS_DELETE_SIGN__ = 0
partitions=0/1
rollup: null
tabletRatio=0/0
tabletList=
cardinality=0
avgRowSize=85.0
numNodes=1
tuple ids: 2
PLAN FRAGMENT 3
OUTPUT EXPRS:
PARTITION: HASH_PARTITIONED: default_cluster:test.rownumber_test_03.id
STREAM DATA SINK
EXCHANGE ID: 08
UNPARTITIONED
2:OlapScanNode
TABLE: rownumber_test_03
PREAGGREGATION: OFF. Reason: null
PREDICATES: t2.is_deleted_flg = 0, t2.__DORIS_DELETE_SIGN__ = 0
partitions=0/1
rollup: null
tabletRatio=0/0
tabletList=
cardinality=0
avgRowSize=37.0
numNodes=1
tuple ids: 3
Tuples:
TupleDescriptor{id=0, tbl=null, byteSize=48, materialized=true}
SlotDescriptor{id=0, col=code, type=VARCHAR(*)}
parent=0
materialized=true
byteSize=16
byteOffset=16
nullIndicatorByte=0
nullIndicatorBit=0
slotIdx=1
SlotDescriptor{id=1, col=name, type=CHAR(*)}
parent=0
materialized=true
byteSize=16
byteOffset=32
nullIndicatorByte=0
nullIndicatorBit=1
slotIdx=2
SlotDescriptor{id=2, col=DORIS_DELETE_SIGN, type=TINYINT}
parent=0
materialized=true
byteSize=1
byteOffset=1
nullIndicatorByte=0
nullIndicatorBit=-1
slotIdx=0
TupleDescriptor{id=1, tbl=rownumber_test_01, byteSize=32, materialized=true}
SlotDescriptor{id=19, col=id, type=VARCHAR(*)}
parent=1
materialized=true
byteSize=16
byteOffset=16
nullIndicatorByte=0
nullIndicatorBit=-1
slotIdx=1
SlotDescriptor{id=20, col=DORIS_DELETE_SIGN, type=TINYINT}
parent=1
materialized=true
byteSize=1
byteOffset=0
nullIndicatorByte=0
nullIndicatorBit=-1
slotIdx=0
TupleDescriptor{id=2, tbl=rownumber_test_02, byteSize=96, materialized=true}
SlotDescriptor{id=3, col=id, type=VARCHAR(*)}
parent=2
materialized=true
byteSize=16
byteOffset=16
nullIndicatorByte=0
nullIndicatorBit=-1
slotIdx=2
SlotDescriptor{id=6, col=client_actor_id, type=VARCHAR(*)}
parent=2
materialized=true
byteSize=16
byteOffset=32
nullIndicatorByte=0
nullIndicatorBit=1
slotIdx=3
SlotDescriptor{id=8, col=service_actor_id, type=VARCHAR(*)}
parent=2
materialized=true
byteSize=16
byteOffset=48
nullIndicatorByte=0
nullIndicatorBit=2
slotIdx=4
SlotDescriptor{id=9, col=service_application, type=VARCHAR(*)}
parent=2
materialized=true
byteSize=16
byteOffset=64
nullIndicatorByte=0
nullIndicatorBit=-1
slotIdx=5
SlotDescriptor{id=10, col=refresh_token_expiration, type=DATETIME}
parent=2
materialized=true
byteSize=16
byteOffset=80
nullIndicatorByte=0
nullIndicatorBit=-1
slotIdx=6
SlotDescriptor{id=11, col=is_deleted_flg, type=INT}
parent=2
materialized=true
byteSize=4
byteOffset=4
nullIndicatorByte=0
nullIndicatorBit=0
slotIdx=1
SlotDescriptor{id=21, col=DORIS_DELETE_SIGN, type=TINYINT}
parent=2
materialized=true
byteSize=1
byteOffset=1
nullIndicatorByte=0
nullIndicatorBit=-1
slotIdx=0
TupleDescriptor{id=3, tbl=rownumber_test_03, byteSize=48, materialized=true}
SlotDescriptor{id=4, col=token_id, type=VARCHAR(*)}
parent=3
materialized=true
byteSize=16
byteOffset=16
nullIndicatorByte=0
nullIndicatorBit=1
slotIdx=2
SlotDescriptor{id=5, col=is_deleted_flg, type=INT}
parent=3
materialized=true
byteSize=4
byteOffset=4
nullIndicatorByte=0
nullIndicatorBit=0
slotIdx=1
SlotDescriptor{id=7, col=name, type=VARCHAR(*)}
parent=3
materialized=true
byteSize=16
byteOffset=32
nullIndicatorByte=0
nullIndicatorBit=2
slotIdx=3
SlotDescriptor{id=22, col=DORIS_DELETE_SIGN, type=TINYINT}
parent=3
materialized=true
byteSize=1
byteOffset=1
nullIndicatorByte=0
nullIndicatorBit=-1
slotIdx=0
TupleDescriptor{id=4, tbl=null, byteSize=8, materialized=false}
SlotDescriptor{id=12, col=null, type=BIGINT}
parent=4
materialized=true
byteSize=8
byteOffset=0
nullIndicatorByte=0
nullIndicatorBit=-1
slotIdx=0
TupleDescriptor{id=5, tbl=null, byteSize=16, materialized=false}
SlotDescriptor{id=13, col=null, type=BIGINT}
parent=5
materialized=true
byteSize=8
byteOffset=8
nullIndicatorByte=0
nullIndicatorBit=0
slotIdx=0
TupleDescriptor{id=6, tbl=t, byteSize=0, materialized=false}
SlotDescriptor{id=14, col=client_actor_id, type=VARCHAR(*)}
parent=6
materialized=false
byteSize=0
byteOffset=-1
nullIndicatorByte=0
nullIndicatorBit=0
slotIdx=0
SlotDescriptor{id=15, col=name, type=VARCHAR(*)}
parent=6
materialized=false
byteSize=0
byteOffset=-1
nullIndicatorByte=0
nullIndicatorBit=0
slotIdx=0
SlotDescriptor{id=16, col=rn, type=BIGINT}
parent=6
materialized=false
byteSize=0
byteOffset=-1
nullIndicatorByte=0
nullIndicatorBit=0
slotIdx=0
TupleDescriptor{id=7, tbl=t1, byteSize=0, materialized=false}
SlotDescriptor{id=17, col=client_actor_id, type=VARCHAR(*)}
parent=7
materialized=false
byteSize=0
byteOffset=-1
nullIndicatorByte=0
nullIndicatorBit=0
slotIdx=0
SlotDescriptor{id=18, col=name, type=VARCHAR(*)}
parent=7
materialized=false
byteSize=0
byteOffset=-1
nullIndicatorByte=0
nullIndicatorBit=0
slotIdx=0
TupleDescriptor{id=8, tbl=null, byteSize=16, materialized=true}
SlotDescriptor{id=23, col=null, type=BIGINT}
parent=8
materialized=true
byteSize=8
byteOffset=8
nullIndicatorByte=0
nullIndicatorBit=0
slotIdx=0
TupleDescriptor{id=9, tbl=null, byteSize=208, materialized=true}
SlotDescriptor{id=24, col=id, type=VARCHAR(*)}
parent=9
materialized=true
byteSize=16
byteOffset=16
nullIndicatorByte=0
nullIndicatorBit=4
slotIdx=4
SlotDescriptor{id=25, col=client_actor_id, type=VARCHAR(*)}
parent=9
materialized=true
byteSize=16
byteOffset=32
nullIndicatorByte=0
nullIndicatorBit=5
slotIdx=5
SlotDescriptor{id=26, col=service_actor_id, type=VARCHAR(*)}
parent=9
materialized=true
byteSize=16
byteOffset=48
nullIndicatorByte=0
nullIndicatorBit=6
slotIdx=6
SlotDescriptor{id=27, col=service_application, type=VARCHAR(*)}
parent=9
materialized=true
byteSize=16
byteOffset=64
nullIndicatorByte=0
nullIndicatorBit=7
slotIdx=7
SlotDescriptor{id=28, col=refresh_token_expiration, type=DATETIME}
parent=9
materialized=true
byteSize=16
byteOffset=80
nullIndicatorByte=1
nullIndicatorBit=0
slotIdx=8
SlotDescriptor{id=29, col=is_deleted_flg, type=INT}
parent=9
materialized=true
byteSize=4
byteOffset=4
nullIndicatorByte=0
nullIndicatorBit=2
slotIdx=2

What You Expected?

Can a subquery containing row_number be joined?

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Code of Conduct

hf200012 commented 2 years ago

upgrade 1.0

mysql> insert into test.rownumber_test_00
    -> select
    -> t0.id,
    -> t1.name
    -> from
    -> test.rownumber_test_01 t0
    -> left join
    -> (select
    -> t.client_actor_id,
    -> t.name
    -> from
    -> (select
    -> t1.client_actor_id,
    -> t2.name,
    -> row_number()OVER(PARTITION BY t1.client_actor_id,t1.service_actor_id,t1.service_application,t2.name ORDER BY t1.refresh_token_expiration desc) AS rn
    -> from test.rownumber_test_02 t1
    -> left join test.rownumber_test_03 t2
    -> on t1.id = t2.token_id and t2.is_deleted_flg=0
    -> where t1.is_deleted_flg=0) t
    -> where t.rn = 1
    -> ) t1
    -> on t0.id= t1.client_actor_id
    -> ;
Query OK, 0 rows affected (0.04 sec)
{'label':'insert_4e331671d65a4303-834e7fea963d08cc', 'status':'VISIBLE', 'txnId':'17042'}