apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.44k stars 2.23k forks source link

`org.apache.iceberg.actions.RewriteDataFiles` implementation for Apache Flink #9306

Open lkokhreidze opened 11 months ago

lkokhreidze commented 11 months ago

Query engine

Flink

Question

Hello, Is there a reason why Flink doesn't support RewriteDataFiles API? I'm particularly interested in zorder rewrite strategy which is supported with Spark but not With Flink.

pvary commented 10 months ago

Currently the only supported action is https://github.com/apache/iceberg/blob/main/flink/v1.18/flink/src/main/java/org/apache/iceberg/flink/actions/RewriteDataFilesAction.java

We are working on migrating the Flink Sink to the Flink SinkV2 API. See: #8653. After that we plan to add the possibility to add compaction to the PostCommitTopology for a Sink.

If someone could make the zorder rewrite startegy available as a Flink action, it would greatly reduce the time to make it available even in the Sink as well.

lkokhreidze commented 10 months ago

Thanks @pvary that looks really awesome and helpful. I'll try to look at zorder implementation and see what we could do to contribute.

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.