apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.42k stars 2.22k forks source link

Does the FlushOnEveryBlock feature in Avro affect Iceberg data integrity? #10142

Open GreatStone opened 6 months ago

GreatStone commented 6 months ago

Query engine

EMR

Question

I’ve observed that Iceberg employs org.apache.avro.file.DataFileWriter for writing Avro files, relying on some default settings. One particular setting, FlushOnEveryBlock, triggers numerous flush operations during extensive single updates, potentially impacting system performance. I’m curious about two aspects:

github-actions[bot] commented 5 days ago

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.