apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.13k stars 842 forks source link

[core] Read deletion indexes at once to reduce file IO in splits generation #3646

Closed Zouxxyy closed 6 days ago

Zouxxyy commented 6 days ago

Purpose

Read deletion indexes at once to reduce file IO in splits generation

Tests

Test on tpcds q23a

before

grep FsStats spark-thrift-server.log | grep web_sales | wc
   7437   81807 1770826

after

grep FsStats spark-thrift-server.log | grep web_sales | wc
     67     737   16257

Documentation