apache / paimon

Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
https://paimon.apache.org/
Apache License 2.0
2.16k stars 855 forks source link

[core]remove_orphan_files support dry run and optimize output #3508

Closed MOBIN-F closed 2 weeks ago

MOBIN-F commented 1 month ago

Purpose

  1. remove_orphan_files support dry run,similar to the iceberg image

  2. The current remove_orphan_files (flink/spark) only outputs the number of deleted files after execution. I think it should output the orphan file path to be more transparent to users. as follows

remove_orphan_files procedures output in spark/flink procedures image

remove_orphan_files action output in flink action jar 企业微信截图_1718182610243

Tests

API and Format

Documentation

Tests

API and Format

Documentation

MOBIN-F commented 2 weeks ago

please help review this pr,tks @JingsongLi @yuzelin