Closed danielhumanmod closed 1 month ago
Do we need to handle UUID here for Hudi partition value? https://github.com/apache/incubator-xtable/blob/main/xtable-core/src/main/java/org/apache/xtable/hudi/HudiPartitionValuesExtractor.java#L160-L162
Like:
case FIXED:
case BYTES:
case UUID:
parsedValue = valueAsString.getBytes(StandardCharsets.UTF_8);
break;
TODO: Probably we need some tests for round trip conversions, like Iceberg -> Delta Lake -> Iceberg to verify that UUIDs remain consistent throughout the conversions
Do we need to handle UUID here for Hudi partition value? https://github.com/apache/incubator-xtable/blob/main/xtable-core/src/main/java/org/apache/xtable/hudi/HudiPartitionValuesExtractor.java#L160-L162
Like:
case FIXED: case BYTES: case UUID: parsedValue = valueAsString.getBytes(StandardCharsets.UTF_8); break;
@danielhumanmod yes, let's add that
TODO: Probably we need some tests for round trip conversions, like Iceberg -> Delta Lake -> Iceberg to verify that UUIDs remain consistent throughout the conversions
@danielhumanmod we don't have any full round trip testing yet but I would like to see some added to our integration tests in the future. For this PR we can start by adding a UUID to the schema used by the Iceberg table generator that is used in ITConversionController
@danielhumanmod apologies for the delay on this review. I was out of office for the last 10 days. Can you resolve the conflict that GitHub is flagging? I am also going to spend a little time trying to resolve the hudi side of the reader today in parallel to see if we just need some property set on the reader.
@danielhumanmod apologies for the delay on this review. I was out of office for the last 10 days. Can you resolve the conflict that GitHub is flagging? I am also going to spend a little time trying to resolve the hudi side of the reader today in parallel to see if we just need some property set on the reader.
Thanks Tim! Sure, I have solved the conflict in latest commit. And appreciate your time to help on the Hudi issue!
Hi @the-other-tim-brown , I’ve addressed all the issues mentioned in the previous comments and have squashed all the commits into one.
Additionally, I’ve refined the comparison logic in checkDatasetEquivalence
to make it less invasive to the existing code. Now, the UUID-specific comparison only applies to datasets containing UUID fields, which also resolves the ordering issue.
Thank you so much for your detailed review and suggestions—they were incredibly helpful!
Thank you for your contribution!
Thank you so much for your detailed review and approval, @the-other-tim-brown ! Your feedback helped me improve the changes, and I learned a lot from the process. I’m really looking forward to contributing more to the project and continuing to collaborate with you in the future.
Important Read
Aim to address #112
What is the purpose of the pull request
Create a UUID type in OneType and handle it appropriately in all converters. Only Iceberg has a concept of a UUID and it is a fixed size byte array in the underlying parquet files. We should aim to preserve the metadata that this byte array came from a UUID for round trip conversions.
Brief change log
UUID
type in InternalTypeVerify this pull request
This pull request is already covered by existing tests, all existing tests should pass