apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.61k stars 3.25k forks source link

[Enhancement] Improve the performance of "insert into select" statement #10619

Open lide-reed opened 2 years ago

lide-reed commented 2 years ago

Search before asking

Description

Currently, the performence of "insert into select" statement is very poor, doris 1.1.0-rc03 is 37M/s for the cluster 3 BE Node (16c64g), and for the same case, the rate of impala is 100G/s.

The detail as following:

`MySQL [ssb]> insert into lineorder_flat2 select * from lineorder_flat; Query OK, 600037902 rows affected (26 min 52.81 sec) {'label':'insert_2ed2773ca6654469-81283daa905be7c2', 'status':'VISIBLE', 'txnId':'235'}

MySQL [ssb]> show data; +-----------------+-------------+--------------+ | TableName | Size | ReplicaCount | +-----------------+-------------+--------------+ | customer | 138.437 MB | 12 | | dates | 34.228 KB | 1 | | lineorder | 14.611 GB | 336 | | lineorder_flat | 58.854 GB | 336 | | lineorder_flat2 | 58.854 GB | 336 | | part | 12.969 MB | 12 | | supplier | 9.158 MB | 12 | | Total | 132.476 GB | 1045 | | Quota | 1024.000 TB | 1073741824 | | Left | 1023.871 TB | 1073740779 | +-----------------+-------------+--------------+ 10 rows in set (0.01 sec)

`

Solution

Refer the Implement of Impala rather than stream load? I'm not sure, Maybe Someone has good idea.

Are you willing to submit PR?

Code of Conduct

kpfly commented 2 years ago

@zhannngchen @yixiutt will improve this scenario