apache / incubator-xtable

Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
https://xtable.apache.org/
Apache License 2.0
853 stars 140 forks source link

Delta to Iceberg limitation : is enabling the column mapping necessary? #391

Closed coder012573 closed 6 months ago

coder012573 commented 6 months ago

From the page https://xtable.apache.org/docs/features-and-limitations#delta for delta

When using Delta as the source for an Iceberg target, you may require field IDs set in the parquet schema. To enable that, follow the instructions for enabling column mapping here.

I ran it without setting the field ID in parquet schema but it works. From my understanding, enabling column mapping in delta is for better supporting for schema evolution (including drop or rename column) but without it, I can't see any issues for my target Iceberg table. Could you please confirm if it is mandatory to enable column mapping in source delta format, cc @the-other-tim-brown for help!

Thank you

the-other-tim-brown commented 6 months ago

@coder012573 the sync will work and some engines will be able to read the data but there are some engines that require the field ID to be set in the parquet schema. For example, you will need these values set to read the data from BigQuery and Snowflake as an Iceberg table. You will not need this to read the data from Spark though since there is a default/fallback behavior that kicks in.