Open danielhumanmod opened 2 months ago
@danielhumanmod Can we close this issue ?
Maybe not for now. This is a problem we can’t resolve in issue #112 . After discussing it with @the-other-tim-brown , we decided to leave this issue open as a reminder if we can find a way to make it work on the Hudi or Spark side in the future.
Feature Request / Improvement
Context
In #112 , we are working on adding UUID support in XTable. Currently, only Iceberg has the concept of a UUID, which is stored as a fixed-size byte array in the underlying Parquet files. To facilitate conversion between Iceberg and formats like Delta or Hudi, we introduced a UUID internal type.
Problem Statement
While converting UUIDs from Iceberg to Hudi, we encountered an issue with Spark’s ParquetSchemaConverter. It appears that Spark cannot read FIXED_LEN_BYTE_ARRAY fields annotated with the UUID logical type in Parquet. You can refer to the relevant source code here: ParquetSchemaConverter.
As of now, we have successfully implemented UUID conversion for Iceberg to Delta. However, Iceberg to Hudi remains a limitation due to this issue, which we hope to address in the future.
Error Log
Are you willing to submit PR?
Code of Conduct