apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.13k stars 842 forks source link

[Spark] Partition limit query comparison of paimon primary key table and non-Paimon table #3676

Closed MOBIN-F closed 2 days ago

MOBIN-F commented 2 days ago

Search before asking

Paimon version

paimon-spark-3.3-0.8

Compute Engine

Spark 3.3.2

Minimal reproduce step

none

What doesn't meet your expectations?

We have a Paimon primary key table and a non-Paimon table with the same data. We found that in the query [where pt=20240530 limit 10], the Paimon primary key table is much slower than the non-Paimon table. paimon-pk table: paimon TBLPROPERTIES

  "highestFieldId" : 56,
  "partitionKeys" : [ "pt" ],
  "primaryKeys" : [ "id", "pt" ],
  "options" : {
    "bucket" : "1",
    "num-sorted-run.stop-trigger" : "2147483647",
    "changelog.num-retained.min" : "1",
    "changelog.num-retained.max" : "2",
    "changelog-producer" : "none",
    "snapshot.num-retained.max" : "3",
    "snapshot.num-retained.min" : "1",
    "sink.parallelism" : "5",
    "deletion-vectors.enabled" : "true",
    "compaction.optimization-interval" : "10",
    "sort-spill-threshold" : "10"
  },
  "timeMillis" : 1720077353013

[select * from paimon_catalog.rt_ods.paimon_xxxx_d where pt=20240530 limit 10] image image image

$hadoop fs -ls -t -r /warehouse/rt_ods/paimon_uoc_order_main_d/pt=20240530/*
Found 5 items
-rwxrwx--x+  2 hive supergroup  134829818 2024-07-04 23:53 /warehouse/rt_ods/paimon_xxxx_d/pt=20240530/bucket-0/data-c7aa5c7d-0f77-4567-abc3-066bdbd677b1-2.orc
-rwxrwx--x+  2 hive supergroup  135744147 2024-07-05 08:03 /warehouse/rt_ods/paimon_xxxx_d/pt=20240530/bucket-0/data-c7aa5c7d-0f77-4567-abc3-066bdbd677b1-11.orc
-rwxrwx--x+  2 hive supergroup  134621349 2024-07-05 08:50 /warehouse/rt_ods/paimon_xxxx_d/pt=20240530/bucket-0/data-c7aa5c7d-0f77-4567-abc3-066bdbd677b1-12.orc
-rwxrwx--x+  2 hive supergroup   49324398 2024-07-05 08:50 /warehouse/rt_ods/paimon_xxxx_d/pt=20240530/bucket-0/data-c7aa5c7d-0f77-4567-abc3-066bdbd677b1-13.orc
-rwxrwx--x+  2 hive supergroup       8976 2024-07-05 09:26 /warehouse/rt_ods/paimon_xxxx_d/pt=20240530/bucket-0/data-39bcc9ec-bdac-4e1a-96bb-1e8a2ec96b3d-10.orc
$hadoop fs -du -s -h /warehouse/rt_ods/paimon_xxxx_d/pt=20240530/
433.5 M  866.9 M  /warehouse/rt_ods/paimon_xxxx_d/pt=20240530

count(1) where pt=20240530 image

non-Paimon table (parquet format): [select * from dw_ods.tdb_xxxx_d where pt=20240530 limit 10] image image

$hadoop fs -du -s -h /ods/tdb_xxxx_d/pt=20240530
846.4 M  1.7 G  /ods/tdb_xxxx_d/pt=20240530

When the file size and number of entries are similar, the limit query performance of paimon seems to be lower than that of non-paimon tables, as if limit does not work?

Anything else?

No response

Are you willing to submit a PR?

Zouxxyy commented 2 days ago

It looks like no partition pruning, is the paimon table a partition table?

MOBIN-F commented 2 days ago

It looks like no partition pruning, is the paimon table a partition table?

@Zouxxyy yes, it is a paimon partition table

  "highestFieldId" : 56,
  "partitionKeys" : [ "pt" ],
  "primaryKeys" : [ "id", "pt" ],
  "options" : {
    "bucket" : "1",
    "num-sorted-run.stop-trigger" : "2147483647",
    "changelog.num-retained.min" : "1",
    "changelog.num-retained.max" : "2",
    "changelog-producer" : "none",
    "snapshot.num-retained.max" : "3",
    "snapshot.num-retained.min" : "1",
    "sink.parallelism" : "5",
    "deletion-vectors.enabled" : "true",
    "compaction.optimization-interval" : "10",
    "sort-spill-threshold" : "10"
  },
  "timeMillis" : 1720077353013
Zouxxyy commented 2 days ago

This looks wired, what type is the pt field, and can you provide the result of explain select * from paimon_catalog.rt_ods.paimon_xxxx_d where pt=20240530 limit 10?

MOBIN-F commented 2 days ago

pt is STRING type plan :

== Physical Plan ==
CollectLimit 10
+- *(1) Project [reason#11, user_province#12, update_sys_tm_mill#13, complain_reason#14, user_address_detail#15, shop_reduce_fee#16L, user_contact#17, callback_class#18, receive_area#19, receive_phone#20, user_type#21, enable#22, receive_name#23, invoice_id#24, id#25L, delete_flag#26, biz_order_id#27, order_type#28, ext3#29, ext2#30, ext1#31, pay_ext#32, biz_status#33, visible#34, ... 33 more fields]
   +- *(1) Filter (cast(pt#67 as int) = 20240530)
      +- BatchScan[reason#11, user_province#12, update_sys_tm_mill#13, complain_reason#14, user_address_detail#15, shop_reduce_fee#16L, user_contact#17, callback_class#18, receive_area#19, receive_phone#20, user_type#21, enable#22, receive_name#23, invoice_id#24, id#25L, delete_flag#26, biz_order_id#27, order_type#28, ext3#29, ext2#30, ext1#31, pay_ext#32, biz_status#33, visible#34, ... 33 more fields] PaimonScan: [paimon_xxxx_d], PushedFilters: [IsNotNull(pt)] RuntimeFilters: []
Zouxxyy commented 2 days ago

try select * from paimon_catalog.rt_ods.paimon_xxxx_d where pt='20240530' limit 10

MOBIN-F commented 2 days ago

@Zouxxyy tks, Using where pt='20240530' its performance is normal Maybe paimon should support this implicit conversion? I observed that this problem does not seem to exist for non-paimon tables

Zouxxyy commented 2 days ago

@Zouxxyy tks, Using where pt='20240530' its performance is normal Maybe paimon should support this implicit conversion? I observed that this problem does not seem to exist for non-paimon tables

Yes, I tested that this implicit conversion filter will not be passed into the DS V2 Scan's ScanBuilder, maybe the current interface cannot implement this ability.

MOBIN-F commented 2 days ago

@Zouxxyy tks, Using where pt='20240530' its performance is normal Maybe paimon should support this implicit conversion? I observed that this problem does not seem to exist for non-paimon tables

Yes, I tested that this implicit conversion filter will not be passed into the DS V2 Scan's ScanBuilder, maybe the current interface cannot implement this ability.

ok, tks !