An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
Propose a fix to prevent delta table got duplicate ids assigned when schema have nested fields and ids assigned.
Issue: today when we are assigning column's ids we first compute the maxId of existing columns and assign ids for new fields from maxId + 1. However, the existing code doesn't consider nested ids when computing the maxId, so it's possible to have duplicate ids assigned to different columns - which causes failure of uniform iceberg conversion since iceberg requires that id is unique for each column.
Proposed fix: we are adding the logic to consider nested fields' ids when computing maxId.
Description
Propose a fix to prevent delta table got duplicate ids assigned when schema have nested fields and ids assigned.
Issue: today when we are assigning column's ids we first compute the
maxId
of existing columns and assign ids for new fields frommaxId + 1
. However, the existing code doesn't consider nested ids when computing themaxId
, so it's possible to have duplicate ids assigned to different columns - which causes failure of uniform iceberg conversion since iceberg requires that id is unique for each column.Proposed fix: we are adding the logic to consider nested fields' ids when computing
maxId
.Does this PR introduce any user-facing changes?
No