apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.17k stars 2.14k forks source link

Usage of Hidden Partitioning #8031

Open LevisBale0824 opened 1 year ago

LevisBale0824 commented 1 year ago

Apache Iceberg version

1.1.0

Query engine

Spark

Please describe the bug šŸž

When I create a table use the example sql: CREATE TABLE prod.db.sample ( id bigint, data string, category string, ts timestamp) USING iceberg PARTITIONED BY (years(ts), days(ts)) Then it will throw exception: "Cannot add redundant partition field" But when I create table use follow sql: CREATE TABLE prod.db.sample ( id bigint, data string, category string, ts timestamp) USING iceberg I found that when I use the following sql in order after creating the table it will success. ALTER TABLE prod.db.sample ADD PARTITION FIELD years(ts) ALTER TABLE prod.db.sample ADD PARTITION FIELD days(ts) I think when we add hidden partition(year/month/day/hour) should check if column has been used year/month/day/hour as the same as we do when create table; I also find that the function named "checkForRedundantAddedPartitions" of BaseUpdatePartitionSpec.class, when we use alter table the size of addedTimeFields in this function is always 0, checkArgument always invalid?

nastra commented 1 year ago

I agree that this is suprising behavior from an end user's perspective, but it is expected according to https://github.com/apache/iceberg/blob/223177faf955bd8f11864477da16013cf5d7cc75/core/src/test/java/org/apache/iceberg/TestUpdatePartitionSpec.java#L204-L206

This is what makes https://iceberg.apache.org/docs/latest/evolution/#partition-evolution properly work

LevisBale0824 commented 1 year ago

I agree that this is suprising behavior from an end user's perspective, but it is expected according to

https://github.com/apache/iceberg/blob/223177faf955bd8f11864477da16013cf5d7cc75/core/src/test/java/org/apache/iceberg/TestUpdatePartitionSpec.java#L204-L206

This is what makes https://iceberg.apache.org/docs/latest/evolution/#partition-evolution properly work

I think it means that we can use replacePartition to update the partition with time type and old data will also useful

github-actions[bot] commented 1 week ago

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.