apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.49k stars 2.24k forks source link

`write.parquet.compression-codec` being set even if file-format is not parquet #9490

Open oneonestar opened 10 months ago

oneonestar commented 10 months ago

Apache Iceberg version

1.4.3 (latest release)

Query engine

Trino

Please describe the bug 🐞

In Trino 436 (Iceberg 1.4.3), write.parquet.compression-codec property is also being set even if the file-format is not parquet. (https://github.com/trinodb/trino/issues/20401)

I think the problem could be related to https://github.com/apache/iceberg/pull/8593#issuecomment-1740507634

trino> CREATE TABLE test.property_test (c1 integer) WITH (format = 'ORC');
CREATE TABLE
trino> SELECT * FROM test."property_test$properties";
               key               | value
---------------------------------+-------
 write.format.default            | ORC
 write.parquet.compression-codec | zstd
(2 rows)

trino> CREATE TABLE test.property_test (c1 integer) WITH (format = 'AVRO');
CREATE TABLE
trino> SELECT * FROM test."property_test$properties";
               key               | value
---------------------------------+-------
 write.format.default            | AVRO
 write.parquet.compression-codec | zstd
(2 rows)
findinpath commented 10 months ago

cc @aokolnychyi pls see org.apache.iceberg.TableMetadata#persistedProperties in 2e291c2b

amogh-jahagirdar commented 10 months ago

Yeah looks like we should conditionally persist those properties based on the format properties.

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.