Closed jinkun2014 closed 1 week ago
It looks like a good optimization point, can someone assign this task to me?
@chenminghua8 大佬,这个已经实现了吗,为什么我这个最新版本3.3.0版本的执行计划没看到谓词下推呢?
@jinkun2014 This bugfix should be contained in 3.3.0, maybe there is another bug. Can you give an example to reproduce this?
@LiShuMing example as :
-- test.t_amadb_product definition
DROP table t_amadb_product
CREATE TABLE `t_amadb_product` (
`Id` bigint(20) NOT NULL COMMENT "主键Id",
`UserId` int(11) NOT NULL COMMENT "用户主键Id",
`AccountId` int(11) NOT NULL COMMENT "卖家账号Id",
`Site` varchar(65533) NOT NULL COMMENT "站点",
`Sku` varchar(65533) NULL COMMENT "卖家自己定义的商品标识"
) ENGINE=OLAP
PRIMARY KEY(`Id`, `UserId`, `AccountId`, `Site`)
COMMENT "产品销量、库存、费用表"
PARTITION BY RANGE(`UserId`)
(PARTITION p1 VALUES [("-2147483648"), ("10")),
PARTITION p2 VALUES [("10"), ("20")),
PARTITION p3 VALUES [("20"), ("30")),
PARTITION p4 VALUES [("30"),("2147483647")))
DISTRIBUTED BY HASH(`Id`) BUCKETS 6
ORDER BY(`UserId`, `AccountId`, `Site`, `Sku`)
PROPERTIES (
"compression" = "LZ4",
"datacache.enable" = "true",
"datacache.partition_duration" = "1 months",
"enable_async_write_back" = "false",
"enable_persistent_index" = "true",
"persistent_index_type" = "LOCAL",
"replication_num" = "1",
"storage_volume" = "builtin_storage_volume"
);
DROP table t_amzdb_orders_item_analysis_info
CREATE TABLE `t_amzdb_orders_item_analysis_info` (
`Id` largeint(40) NOT NULL COMMENT "",
`BelongDay` date NOT NULL COMMENT "",
`UserId` int(11) NOT NULL COMMENT "",
`AccountId` int(11) NOT NULL COMMENT "",
`Site` varchar(65533) NOT NULL COMMENT "",
`AmazonOrderId` varchar(65533) NULL COMMENT "",
`Sku` varchar(65533) NULL COMMENT ""
) ENGINE=OLAP
PRIMARY KEY(`Id`, `BelongDay`, `UserId`, `AccountId`, `Site`)
COMMENT "OLAP"
PARTITION BY date_trunc('month', BelongDay)
DISTRIBUTED BY HASH(`Id`)
ORDER BY(`UserId`, `AccountId`, `Site`, `BelongDay`)
PROPERTIES (
"compression" = "LZ4",
"datacache.enable" = "true",
"enable_async_write_back" = "false",
"enable_persistent_index" = "true",
"persistent_index_type" = "LOCAL",
"replication_num" = "1",
"storage_volume" = "builtin_storage_volume"
);
DROP table t_amzdb_orders_update_log
CREATE TABLE `t_amzdb_orders_update_log` (
`Id` bigint(20) NOT NULL COMMENT "主键Id",
`BelongDay` date NOT NULL COMMENT "归属日期",
`UserId` int(11) NOT NULL COMMENT "用户主键Id",
`AccountId` int(11) NOT NULL COMMENT "账号Id",
`Site` varchar(65533) NOT NULL COMMENT "站点",
`AmazonOrderId` varchar(65533) NOT NULL COMMENT "订单号"
) ENGINE=OLAP
PRIMARY KEY(`Id`, `BelongDay`, `UserId`, `AccountId`)
COMMENT "单个订单每日增量更新记录表"
PARTITION BY date_trunc('month', BelongDay)
DISTRIBUTED BY HASH(`Id`)
ORDER BY(`UserId`, `AccountId`,`Site`, `BelongDay`)
PROPERTIES (
"compression" = "LZ4",
"datacache.enable" = "true",
"enable_async_write_back" = "false",
"enable_persistent_index" = "true",
"persistent_index_type" = "LOCAL",
"replication_num" = "1",
"storage_volume" = "builtin_storage_volume"
);
INSERT INTO test.t_amzdb_orders_update_log (Id, UserId, AccountId, Site, BelongDay, AmazonOrderId) VALUES(6751344676, 5, 11400139, 'us', '2024-05-29', '114-1100145-0637010');
INSERT INTO test.t_amzdb_orders_update_log (Id, UserId, AccountId, Site, BelongDay, AmazonOrderId) VALUES(6515727486, 15, 11400139, 'us', '2024-06-04', '111-5952273-4477057');
INSERT INTO test.t_amzdb_orders_item_analysis_info (Id, UserId, AccountId, Site, BelongDay, AmazonOrderId, Sku) VALUES(6751344676, 5, 11400139, 'us', '2024-05-29', '114-1100145-0637010','S1');
INSERT INTO test.t_amzdb_orders_item_analysis_info (Id, UserId, AccountId, Site, BelongDay, AmazonOrderId, Sku) VALUES(6515727486, 15, 11400139, 'us', '2024-06-04', '111-5952273-4477057','S2');
INSERT INTO test.t_amadb_product (Id, UserId, AccountId, Site, Sku) VALUES(6751344676, 5, 11400139, 'us','S1');
INSERT INTO test.t_amadb_product (Id, UserId, AccountId, Site, Sku) VALUES(6515727486, 15, 11400139, 'us','S2');
result1:
EXPLAIN
SELECT
t0.Id AS LastId,
t1.AmazonOrderId,
t2.Sku
FROM
t_amzdb_orders_update_log t0
INNER JOIN t_amzdb_orders_item_analysis_info t1 ON
t1.UserId = t0.UserId
AND t1.AccountId = t0.AccountId
AND t1.Site = t0.Site
AND t1.BelongDay = t0.BelongDay
AND t1.AmazonOrderId = t0.AmazonOrderId
INNER JOIN t_amadb_product t2 ON
t1.UserId = t2.UserId
AND t1.Sku = t2.Sku
AND t1.AccountId = t2.AccountId
AND t1.Site = t2.Site
WHERE
t0.UserId = 5
AND t0.AccountId = 11400139
AND t0.Site = 'us'
ORDER BY
t0.Id ASC
result2:
EXPLAIN
SELECT
t0.Id AS LastId,
t1.AmazonOrderId,
t2.Sku
FROM
t_amzdb_orders_update_log t0
INNER JOIN t_amzdb_orders_item_analysis_info t1 ON
t1.UserId = t0.UserId
AND t1.AccountId = t0.AccountId
AND t1.Site = t0.Site
AND t1.BelongDay = t0.BelongDay
AND t1.AmazonOrderId = t0.AmazonOrderId
INNER JOIN t_amadb_product t2 ON
t0.UserId = t2.UserId
AND t1.Sku = t2.Sku
AND t0.AccountId = t2.AccountId
AND t0.Site = t2.Site
WHERE
t0.UserId = 5
AND t0.AccountId = 11400139
AND t0.Site = 'us'
ORDER BY
t0.Id ASC
Because the simulated data has much fewer partitions than the real data, this example is for reference only. I believe you have more abundant test cases.
In the main branch, the predicate had been pushed down. This pr is merged after branch-3.3 created, so the predicate cannot be pushed down in version-3.3.
Feature request
Is your feature request related to a problem? Please describe.
图中(EXPLAIN的结果,仅在这个EXPLAIN结果可靠且代表真实的执行过程)可以看出在join中,t1.UserId = t0.UserId, t2.UserId = t1.UserId,理论上可以推断出 t2.UserId = t0.UserId, 然后现有的SQL优化器并没有这么去推断,导致 t2 几乎扫描了所有的分区,而手动改写SQL之后, t2.UserId=t0.UserId, 只扫描了一个分区,如果能自动实现类似的推断进行查询条件的常量替换,那么对于类似的SQL查询将有很大的性能提升
Describe the solution you'd like
Describe alternatives you've considered
Additional context