apache / incubator-xtable

Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
https://xtable.apache.org/
Apache License 2.0
919 stars 147 forks source link

No data returned when querying hudi target table generated from iceberg source #404

Closed rahul-ghiware closed 7 months ago

rahul-ghiware commented 7 months ago

Using Spark 3.4.0, Scala 2.12 and Iceberg spark runtime 1.4.2

rghiware ~ $ cd /tmp
rghiware /tmp $ cd iceberg-warehouse/people
rghiware iceberg-warehouse/people $ ls
data        metadata
rghiware iceberg-warehouse/people $ cd data
rghiware people/data $ ls
00000-3-4117ce4f-ff56-410b-a248-c9ed512903c8-00001.parquet
sourceFormat: ICEBERG
targetFormats:
  - HUDI
  - DELTA
datasets:
  -
    tableBasePath: file:///tmp/iceberg-warehouse/people
    tableDataPath: file:///tmp/iceberg-warehouse/people/data
    tableName: people

df.show() 24/03/29 10:27:32 WARN package: Truncated the string representation of a plan since it was too large. This behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'. +---+-------+---+----+-------------------+ | id| name|age|city| create_ts| +---+-------+---+----+-------------------+ | 6|Charlie| 31| DFW|2023-08-29 00:00:00| | 1| John| 25| NYC|2023-09-28 00:00:00| | 4| Andrew| 40| NYC|2023-10-28 00:00:00| | 3|Michael| 35| ORD|2023-09-28 00:00:00| | 5| Bob| 28| SEA|2023-09-23 00:00:00| | 2| Emily| 30| SFO|2023-09-28 00:00:00| +---+-------+---+----+-------------------+

df.show() +---+----+---+----+---------+ | id|name|age|city|create_ts| +---+----+---+----+---------+ +---+----+---+----+---------+

Not able to figure out if I'm missing anything here or it is an issue with xtable jar.

the-other-tim-brown commented 7 months ago

@rahul-ghiware you will need to set the option hoodie.metadata.enable=true when reading the Hudi data. Is there some demo you were following that we need to update? I see it mentioned here but that is not a really natural place to look for this info.

rahul-ghiware commented 7 months ago

Thank you, @the-other-tim-brown, for directing me to the correct link. I followed the steps outlined in the https://xtable.apache.org/docs/how-to/ link

the-other-tim-brown commented 7 months ago

@sagarlakshmipathy I think we should add a little "read your tables" section or something like that and we can highlight the metadata option there as well. What do you think? I can add it if you agree.

sagarlakshmipathy commented 7 months ago

agreed @the-other-tim-brown !

read/validate your tables sounds good!