Closed MOBIN-F closed 2 days ago
It looks like no partition pruning, is the paimon table a partition table?
It looks like no partition pruning, is the paimon table a partition table?
@Zouxxyy yes, it is a paimon partition table
"highestFieldId" : 56,
"partitionKeys" : [ "pt" ],
"primaryKeys" : [ "id", "pt" ],
"options" : {
"bucket" : "1",
"num-sorted-run.stop-trigger" : "2147483647",
"changelog.num-retained.min" : "1",
"changelog.num-retained.max" : "2",
"changelog-producer" : "none",
"snapshot.num-retained.max" : "3",
"snapshot.num-retained.min" : "1",
"sink.parallelism" : "5",
"deletion-vectors.enabled" : "true",
"compaction.optimization-interval" : "10",
"sort-spill-threshold" : "10"
},
"timeMillis" : 1720077353013
This looks wired, what type is the pt field, and can you provide the result of explain select * from paimon_catalog.rt_ods.paimon_xxxx_d where pt=20240530 limit 10
?
pt is STRING type plan :
== Physical Plan ==
CollectLimit 10
+- *(1) Project [reason#11, user_province#12, update_sys_tm_mill#13, complain_reason#14, user_address_detail#15, shop_reduce_fee#16L, user_contact#17, callback_class#18, receive_area#19, receive_phone#20, user_type#21, enable#22, receive_name#23, invoice_id#24, id#25L, delete_flag#26, biz_order_id#27, order_type#28, ext3#29, ext2#30, ext1#31, pay_ext#32, biz_status#33, visible#34, ... 33 more fields]
+- *(1) Filter (cast(pt#67 as int) = 20240530)
+- BatchScan[reason#11, user_province#12, update_sys_tm_mill#13, complain_reason#14, user_address_detail#15, shop_reduce_fee#16L, user_contact#17, callback_class#18, receive_area#19, receive_phone#20, user_type#21, enable#22, receive_name#23, invoice_id#24, id#25L, delete_flag#26, biz_order_id#27, order_type#28, ext3#29, ext2#30, ext1#31, pay_ext#32, biz_status#33, visible#34, ... 33 more fields] PaimonScan: [paimon_xxxx_d], PushedFilters: [IsNotNull(pt)] RuntimeFilters: []
try select * from paimon_catalog.rt_ods.paimon_xxxx_d where pt='20240530' limit 10
@Zouxxyy tks, Using where pt='20240530' its performance is normal Maybe paimon should support this implicit conversion? I observed that this problem does not seem to exist for non-paimon tables
@Zouxxyy tks, Using where pt='20240530' its performance is normal Maybe paimon should support this implicit conversion? I observed that this problem does not seem to exist for non-paimon tables
Yes, I tested that this implicit conversion filter will not be passed into the DS V2 Scan's ScanBuilder
, maybe the current interface cannot implement this ability.
@Zouxxyy tks, Using where pt='20240530' its performance is normal Maybe paimon should support this implicit conversion? I observed that this problem does not seem to exist for non-paimon tables
Yes, I tested that this implicit conversion filter will not be passed into the DS V2 Scan's
ScanBuilder
, maybe the current interface cannot implement this ability.
ok, tks !
Search before asking
Paimon version
paimon-spark-3.3-0.8
Compute Engine
Spark 3.3.2
Minimal reproduce step
none
What doesn't meet your expectations?
We have a Paimon primary key table and a non-Paimon table with the same data. We found that in the query [where pt=20240530 limit 10], the Paimon primary key table is much slower than the non-Paimon table. paimon-pk table: paimon TBLPROPERTIES
[select * from paimon_catalog.rt_ods.paimon_xxxx_d where pt=20240530 limit 10]
count(1) where pt=20240530
non-Paimon table (parquet format): [select * from dw_ods.tdb_xxxx_d where pt=20240530 limit 10]
When the file size and number of entries are similar, the limit query performance of paimon seems to be lower than that of non-paimon tables, as if limit does not work?
Anything else?
No response
Are you willing to submit a PR?