Open mosenberg opened 1 month ago
I don't know if this is really a bug since there is no behavior change really associated with it. I believe this is a leftover from V1 where we were unable to ever remove partition tuple elements so every element had to be essentially nullable incase a new partition spec that no longer included the tuple element.
We could change this for V2 metadata files though. Not sure it's a high priority though unless there is something actually breaking because of this
Apache Iceberg version
None
Query engine
Spark
Please describe the bug π
The issue repros using the following SQL:
As per the above SQL, the column
group
is defined asNOT NULL
(i.e.required
) column in the Iceberg metadata schema. However, in the generated avro manifest file, the partition tuple - which stores the value of thegroup
column by which the table is identity-partitioned - the partition value is stored as an avro union type ["null", "string"].As per my understanding of the Iceberg spec, this is not correct: The output value of an identity partition transform is equal to the source type - in this case
STRING NOT NULL
. The section on manifest files further states:Hence the schema of the partition tuple should be
"string"
and not["null","string"]
.Willingness to contribute