delta-io / delta

An open-source storage framework that enables building a Lakehouse architecture with compute engines including Spark, PrestoDB, Flink, Trino, and Hive and APIs
https://delta.io
Apache License 2.0
7.22k stars 1.62k forks source link

[Delta Uniform] Compute correct MAX_ID in column mapping on a schema with nested fields and already have IDs assigned #3234

Closed ChengJi-db closed 2 weeks ago

ChengJi-db commented 2 weeks ago

Description

Propose a fix to prevent delta table got duplicate ids assigned when schema have nested fields and ids assigned.

Issue: today when we are assigning column's ids we first compute the maxId of existing columns and assign ids for new fields from maxId + 1. However, the existing code doesn't consider nested ids when computing the maxId, so it's possible to have duplicate ids assigned to different columns - which causes failure of uniform iceberg conversion since iceberg requires that id is unique for each column.

Proposed fix: we are adding the logic to consider nested fields' ids when computing maxId.

Does this PR introduce any user-facing changes?

No