apache / incubator-xtable

Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
https://xtable.apache.org/
Apache License 2.0
853 stars 140 forks source link

[Support] Need Help with Trino Read XTable as Delta and Iceberg #405

Closed soumilshah1995 closed 6 months ago

soumilshah1995 commented 6 months ago

hello have created xtable and I am able to read and delta and iceberg and I have Trino where I am able to read the hudi data

Ingest

spark-submit \
    --class org.apache.hudi.utilities.streamer.HoodieStreamer \
    --packages 'org.apache.hudi:hudi-spark3.4-bundle_2.12:0.14.0,org.apache.hadoop:hadoop-aws:3.3.2' \
    --properties-file spark-config.properties \
    --master 'local[*]' \
    --executor-memory 1g \
    --jars /Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-extensions-0.1.0-SNAPSHOT-bundled.jar,/Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-java-client-0.14.0.jar \
     /Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/jar/hudi-utilities-slim-bundle_2.12-0.14.0.jar \
    --table-type COPY_ON_WRITE \
    --target-base-path 's3a://huditest/hudidb/table_name=bronze_orders'  \
    --target-table bronze_orders \
    --op UPSERT \
    --enable-sync \
    --enable-hive-sync \
    --sync-tool-classes 'io.onetable.hudi.sync.OneTableSyncTool' \
    --source-limit 4000000 \
    --source-ordering-field ts \
    --source-class org.apache.hudi.utilities.sources.CsvDFSSource \
    --hoodie-conf 'hoodie.datasource.write.recordkey.field=order_id' \
    --hoodie-conf 'hoodie.datasource.write.partitionpath.field=state' \
    --hoodie-conf 'hoodie.datasource.write.precombine.field=ts' \
    --hoodie-conf 'hoodie.streamer.source.dfs.root=file://///Users/soumilshah/IdeaProjects/SparkProject/DeltaStreamer/sampledata/orders' \
    --hoodie-conf 'hoodie.deltastreamer.csv.header=true' \
    --hoodie-conf 'hoodie.deltastreamer.csv.sep=\t' \
    --hoodie-conf 'hoodie.onetable.formats.to.sync=DELTA,ICEBERG' \
    --hoodie-conf 'hoodie.onetable.target.metadata.retention.hr=168' \
    --hoodie-conf 'hoodie.metadata.index.async=true' \
    --hoodie-conf 'hoodie.metadata.enable=true' \
    --hoodie-conf 'hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor' \
    --hoodie-conf 'hoodie.datasource.hive_sync.metastore.uris=thrift://localhost:9083' \
    --hoodie-conf 'hoodie.datasource.hive_sync.mode=hms' \
    --hoodie-conf 'hoodie.datasource.hive_sync.enable=true' \
    --hoodie-conf 'hoodie.datasource.hive_sync.database=default' \
    --hoodie-conf 'hoodie.datasource.hive_sync.table=bronze_orders' \
    --hoodie-conf 'hoodie.datasource.write.hive_style_partitioning=true'

Screenshot 2024-03-31 at 7 22 50 PM

Trino

Screenshot 2024-03-31 at 7 25 54 PM

As you can see I can read the data from hudi Trino

while I read data from iceberg catalog I get following error

Screenshot 2024-03-31 at 7 27 09 PM

please not if I read from spark it works fine I am able to read the table as iceberg from spark why its not able to read from Trino iceberg catalog