apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.44k stars 2.23k forks source link

Apache Iceberg - Update one record in the table doubles the number of files in the whole table #8378

Closed rafalmo closed 1 month ago

rafalmo commented 1 year ago

Hi, I have table which has 280 parquet files. I want to update only 1 record in a table. When i do this, number of files doubles to 560 (I check it by table.all_files). This is happening on the table in v1 and v2 format ( with option copy-on-write) For merge-on-read it's work properly - Number of files after change = 282. In documentation for copy-on-write I found: "Let say you have two data files in data directory of Iceberg table data-file-1 & data-file-2. You have updated an record, which is present in data-file-2 only. Iceberg will create a new copy of data-file-2 only and apply the changes." In my case it doesn't work like that. Do I need to change something in the settings ?

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] commented 1 month ago

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'