apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.23k stars 2.17k forks source link

Improve remove_orphan_files performance by using "inventory listing" #10426

Open ajantha-bhat opened 4 months ago

ajantha-bhat commented 4 months ago

Feature Request / Improvement

Compared to listFiles API, inventory listing can be cost efficient for remove_orphan_files performance. So, we can enhance the procedure/action to accept the inventory information.

Reference: https://delta.io/blog/efficient-delta-vacuum/

Query engine

Spark

anuragmantri commented 3 months ago

Hi @ajantha-bhat - Don't we already support this after https://github.com/apache/iceberg/pull/4503?

anuragmantri commented 3 months ago

@flyrain did some analysis on this internally. He may have some ideas here.