StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
https://starrocks.io
Apache License 2.0
9.03k stars 1.82k forks source link

Optimize Iceberg Pos Delete Building Process #52212

Open DorianZheng opened 1 month ago

DorianZheng commented 1 month ago

Enhancement

https://iceberg.apache.org/spec/#position-delete-files

According to the Iceberg spec, rows in the delete file must be sorted by file_path then pos, so that we don't have to read the entire delete file but instead seek the start page which contains the referenced data file path and end as soon as we read different data file path

danielhumanmod commented 2 weeks ago

Hi @DorianZheng, I’m interested in this task and would appreciate some clarification. Is the focus here is building an optimizer for position delete file, similar to what we did with IcebergEqualityDeleteRewriteRule?