Open tonycox opened 1 month ago
If you already have an Iceberg table, the source of truth is the Iceberg table. Other conversions are there for generating the schema for the Iceberg table creation.
Generating the same ids is not easily solved, because schema evolution would cause "skipped" ids
@pvary In the example the schema is the same, but in my case I wanted to have an "implicit" schema evolution on write. Say I'd add additional field to source event and on deployment step once the pipeline understands that the schema is updated it evolves target schema as well. Right now I'm skipping ids in schema validation everywhere, even in unit tests as they are inconsistent all the time and I rely only on the ordering of the fields and their existence/absence.
I'm facing a similar challenge. See: https://lists.apache.org/thread/vyw595d0747p33qg886b1o82mcw40523
The visitors could be used to traverse the schema, but you need to match them by name. This becomes problematic when the column names are reused
Apache Iceberg version
1.6.1 (latest release)
Query engine
Flink
Please describe the bug 🐞
When I try to convert Flink ResolvedSchema to Iceberg Schema via
It returns schema definition
which as I suppose is not correct. My assumption comes from whenever I call
catalog.loadTable(id).schema()
it returnsand id validation will fail if let say I'll try to update schema upon extracted from Flink table.
Found lines of id definition: https://github.com/apache/iceberg/blob/799120636e8f5f19c1d7f217ab4968f524bb1246/flink/v1.20/flink/src/main/java/org/apache/iceberg/flink/FlinkTypeToType.java#L187-L189
Willingness to contribute