apache / incubator-xtable

Apache XTable (incubating) is a cross-table converter for lakehouse table formats that facilitates interoperability across data processing systems and query engines.
https://xtable.apache.org/
Apache License 2.0
919 stars 147 forks source link

Wrong hoodie.table.name generated (iceberg->hudi) #494

Open alberttwong opened 4 months ago

alberttwong commented 4 months ago

Search before asking

Please describe the bug 🐞

hudi generated by xtable (source was iceberg table).

export AWS_SECRET_ACCESS_KEY=password
export AWS_ACCESS_KEY_ID=admin
export ENDPOINT=http://minio:9000/
export AWS_REGION=us-east-1
cd /opt/xtable/jars/; java -jar xtable-utilities-0.1.0-SNAPSHOT-bundled.jar --datasetConfig xtable_iceberg.yaml -p core-site.xml
root@spark:/opt/xtable/jars# cat xtable_iceberg.yaml 
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
sourceFormat: ICEBERG
targetFormats:
  - HUDI
  - DELTA
datasets:
  -
    tableBasePath: s3a://warehouse/taxis
    tableName: taxis
    partitionSpec: vendor_id:VALUE
albert@Alberts-MBP Downloads % cat hoodie.properties
#Updated at 2024-07-16T17:09:04.443893Z
#Tue Jul 16 17:09:04 UTC 2024
hoodie.table.type=COPY_ON_WRITE
hoodie.table.metadata.partitions=column_stats,files
hoodie.table.partition.fields=vendor_id
hoodie.archivelog.folder=archived
hoodie.timeline.layout.version=1
hoodie.table.checksum=1914023381
hoodie.datasource.write.drop.partition.columns=false
hoodie.table.timeline.timezone=UTC
hoodie.table.recordkey.fields=
hoodie.table.name=s3a\://warehouse/taxis
hoodie.datasource.write.hive_style_partitioning=true
hoodie.table.metadata.partitions.inflight=
hoodie.populate.meta.fields=false
hoodie.table.keygenerator.class=org.apache.hudi.keygen.SimpleKeyGenerator
hoodie.table.version=6

Are you willing to submit PR?

Code of Conduct

vinishjail97 commented 4 months ago

@alberttwong can you remove partitionSpec in the yaml and try again ? partitionSpec is only required for hudi source tables.

alberttwong commented 4 months ago

removing partitionSpec did not work.