apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.23k stars 2.39k forks source link

[Streamer 0.14.0] Upgrading Deltastreamer script to 0.14.0, empty table failing to initialize metadata table #10203

Closed rohitmittapalli closed 7 months ago

rohitmittapalli commented 7 months ago

Describe the problem you faced

Running a brand new HoodieStreamer on an empty folder, failing to create metadata table. This is running on a fresh build of the HudiUtilitiesBundle jar off of the tip of 0.14.0.

To Reproduce

Steps to reproduce the behavior:

  1. Build Hudi utilities bundle
  2. Start with empty source and empty target
  3. Run the delta-streamer script on

Expected behavior

Environment Description

Additional context

Add any other context about the problem here.

Running on Spark on K8s

Stacktrace

2023-11-28 20:22:21,080 INFO scheduler.DAGScheduler: Job 0 failed: collect at HoodieSparkEngineContext.java:116, took 5.503507 s
2023-11-28 20:22:21,081 INFO transaction.TransactionManager: Transaction ending with transaction owner Option{val=[==>20231128202210873__commit__INFLIGHT]}
2023-11-28 20:22:21,081 INFO lock.InProcessLockProvider: Base Path s3a://simian-example-data-1-aws-output/stats/querying_14, Lock Instance java.util.concurrent.locks.ReentrantReadWriteLock@4334672e[Write locks = 1, Read locks = 0], Thread pool-27-thread-1, In-process lock state RELEASING
2023-11-28 20:22:21,081 INFO lock.InProcessLockProvider: Base Path s3a://simian-example-data-1-aws-output/stats/querying_14, Lock Instance java.util.concurrent.locks.ReentrantReadWriteLock@4334672e[Write locks = 0, Read locks = 0], Thread pool-27-thread-1, In-process lock state RELEASED
2023-11-28 20:22:21,081 INFO lock.InProcessLockProvider: Base Path s3a://simian-example-data-1-aws-output/stats/querying_14, Lock Instance java.util.concurrent.locks.ReentrantReadWriteLock@4334672e[Write locks = 0, Read locks = 0], Thread pool-27-thread-1, In-process lock state ALREADY_RELEASED
2023-11-28 20:22:21,081 INFO lock.LockManager: Released connection created for acquiring lock
2023-11-28 20:22:21,081 INFO transaction.TransactionManager: Transaction ended with transaction owner Option{val=[==>20231128202210873__commit__INFLIGHT]}
2023-11-28 20:22:21,082 ERROR streamer.HoodieStreamer: Shutting down delta-sync due to exception
org.apache.hudi.exception.HoodieException: Failed to instantiate Metadata table
        at org.apache.hudi.client.SparkRDDWriteClient.initializeMetadataTable(SparkRDDWriteClient.java:293)
        at org.apache.hudi.client.SparkRDDWriteClient.initMetadataTable(SparkRDDWriteClient.java:273)
        at org.apache.hudi.client.BaseHoodieWriteClient.doInitTable(BaseHoodieWriteClient.java:1256)
        at org.apache.hudi.client.BaseHoodieWriteClient.initTable(BaseHoodieWriteClient.java:1296)
        at org.apache.hudi.client.SparkRDDWriteClient.bulkInsert(SparkRDDWriteClient.java:223)
        at org.apache.hudi.client.SparkRDDWriteClient.bulkInsert(SparkRDDWriteClient.java:217)
        at org.apache.hudi.utilities.streamer.StreamSync.writeToSink(StreamSync.java:782)
        at org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:446)
        at org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.lambda$startService$1(HoodieStreamer.java:757)
        at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3) (10.10.159.160 executor 1): java.io.EOFException
        at java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
        at java.io.ObjectInputStream$BlockDataInputStream.readUnsignedShort(ObjectInputStream.java:3321)
        at java.io.ObjectInputStream$BlockDataInputStream.readUTF(ObjectInputStream.java:3377)
        at java.io.ObjectInputStream.readUTF(ObjectInputStream.java:1205)
        at org.apache.hudi.hadoop.SerializablePath.readObject(SerializablePath.java:49)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1184)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2322)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
        at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
        at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
        at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
        at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
2023-11-28 20:59:43,116 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 2.0 (TID 3) (10.10.162.226 executor 9): java.io.InvalidClassException: org.apache.hudi.metadata.HoodieBackedTableMetadataWriter; local class incompatible: stream classdesc serialVersionUID = -2113618921263425211, local class serialVersionUID = 2706336234310710024
        at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)
        at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005)
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852)
        at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005)
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
        at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
2023-11-28 20:59:42,217 INFO scheduler.TaskSchedulerImpl: Killing all running tasks in stage 1: Stage finished                                                                                                                                                                                                                                [0/4649]
2023-11-28 20:59:42,217 INFO scheduler.DAGScheduler: Job 1 finished: count at HoodieJavaRDD.java:115, took 3.138478 s
2023-11-28 20:59:42,217 INFO metadata.HoodieBackedTableMetadataWriter: Initializing FILES index with 1 mappings and 1 file groups.
2023-11-28 20:59:42,249 INFO metadata.HoodieBackedTableMetadataWriter: Creating 1 file groups for partition files with base fileId files- at instant time 00000000000000010
2023-11-28 20:59:42,469 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on spark-deltastreamer-hudi-14-driver-headless:36215 in memory (size: 1358.0 B, free: 2.1 GiB)
2023-11-28 20:59:42,477 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on 10.10.194.39:36129 in memory (size: 1358.0 B, free: 2.1 GiB)
2023-11-28 20:59:42,576 INFO spark.SparkContext: Starting job: foreach at HoodieSparkEngineContext.java:155
2023-11-28 20:59:42,577 INFO scheduler.DAGScheduler: Got job 2 (foreach at HoodieSparkEngineContext.java:155) with 1 output partitions
2023-11-28 20:59:42,577 INFO scheduler.DAGScheduler: Final stage: ResultStage 2 (foreach at HoodieSparkEngineContext.java:155)
2023-11-28 20:59:42,577 INFO scheduler.DAGScheduler: Parents of final stage: List()
2023-11-28 20:59:42,578 INFO scheduler.DAGScheduler: Missing parents: List()
2023-11-28 20:59:42,579 INFO scheduler.DAGScheduler: Submitting ResultStage 2 (ParallelCollectionRDD[4] at parallelize at HoodieSparkEngineContext.java:155), which has no missing parents
2023-11-28 20:59:42,678 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 690.3 KiB, free 2.1 GiB)
2023-11-28 20:59:42,681 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 194.4 KiB, free 2.1 GiB)
2023-11-28 20:59:42,682 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on spark-deltastreamer-hudi-14-driver-headless:36215 (size: 194.4 KiB, free: 2.1 GiB)
2023-11-28 20:59:42,682 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1388
2023-11-28 20:59:42,683 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 2 (ParallelCollectionRDD[4] at parallelize at HoodieSparkEngineContext.java:155) (first 15 tasks are for partitions Vector(0))
2023-11-28 20:59:42,683 INFO scheduler.TaskSchedulerImpl: Adding task set 2.0 with 1 tasks resource profile 0
2023-11-28 20:59:42,684 INFO scheduler.FairSchedulableBuilder: Added task set TaskSet_2.0 tasks to pool default
2023-11-28 20:59:42,685 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 3) (10.10.162.226, executor 9, partition 0, PROCESS_LOCAL, 4358 bytes) taskResourceAssignments Map()
2023-11-28 20:59:42,840 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 10.10.162.226:43413 (size: 194.4 KiB, free: 2.1 GiB)
2023-11-28 20:59:43,116 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 2.0 (TID 3) (10.10.162.226 executor 9): java.io.InvalidClassException: org.apache.hudi.metadata.HoodieBackedTableMetadataWriter; local class incompatible: stream classdesc serialVersionUID = -2113618921263425211, local class serialVersionUID = 2706336234310710024
        at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:699)
        at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005)
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852)
        at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:2005)
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1852)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2186)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
        at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
        at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
        at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
        at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
        at java.io.ObjectInputStream.readArray(ObjectInputStream.java:2119)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1657)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2431)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2355)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2213)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1669)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:503)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:461)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:83)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

Deltastreamer script:

/opt/spark/bin/spark-submit \
--jars /hudi_14_base_jars/hudi-utilities-bundle-14.jar,/opt/spark/jars/hadoop-aws.jar,/opt/spark/jars/aws-java-sdk.jar,/opt/spark/jars/hadoop-azure.jar,/opt/spark/jars/wildfly-openssl.jar,/opt/spark/jars/AzureTokenGen.jar,/opt/spark/jars/guava-gcp.jar,/opt/spark/jars/gcs-connector.jar \
--master ${18} \
--deploy-mode client \
--name pts-deltastreamer-k8s-14 \
--conf spark.driver.port=8090 \
--conf spark.hadoop.fs.azure.account.auth.type.${26}.dfs.core.windows.net=Custom \
--conf spark.hadoop.fs.azure.account.oauth.provider.type.${26}.dfs.core.windows.net=applied.java.AzureTokenProvider \
--conf spark.hadoop.token=${27} \
--conf spark.hadoop.expiry=${28} \
--conf spark.hadoop.fs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem \
--conf spark.hadoop.fs.s3a.connection.maximum=10000 \
--conf spark.hadoop.fs.gs.impl=com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem \
--conf spark.driver.host=spark-deltastreamer-hudi-14-driver-headless \
--conf spark.scheduler.mode=FAIR \
--conf spark.kubernetes.namespace=${20} \
--conf spark.kubernetes.authenticate.submission.caCertFile=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
--conf spark.kubernetes.authenticate.submission.oauthTokenFile=/var/run/secrets/kubernetes.io/serviceaccount/token \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-k8s-driver-svcaccount \
--conf spark.kubernetes.node.selector.purpose=spark \
--conf spark.kubernetes.executor.podnameprefix=partitioned-pts-deltastreamer \
--conf spark.jars.ivy=/tmp/.ivy \
--conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
--conf spark.kubernetes.container.image=quay.io/applied_dev/dp_spark_k8s:test-0.14 \
--conf spark.executor.instances=$1 \
--conf spark.driver.memory=${19} \
--conf spark.executor.memory=$2 \
--conf spark.kubernetes.driver.request.cores=${21} \
--conf spark.kubernetes.driver.limit.cores=${22} \
--conf spark.kubernetes.executor.request.cores=${23} \
--conf spark.kubernetes.executor.limit.cores=${24}  \
--conf spark.kubernetes.driver.pod.name=${25} \
--class org.apache.hudi.utilities.streamer.HoodieStreamer /hudi_14_base_jars/hudi-utilities-bundle-14.jar \
--source-class org.apache.hudi.utilities.sources.ParquetDFSSource \
--schemaprovider-class org.apache.hudi.utilities.schema.FilebasedSchemaProvider \
--target-table per_tick_stats_14 \
--table-type COPY_ON_WRITE \
--min-sync-interval-seconds 300 \
--source-limit ${17} \
--continuous \
--source-ordering-field $6 \
--target-base-path $4 \
--hoodie-conf hoodie.clustering.async.enabled=${10} \
--hoodie-conf hoodie.clustering.plan.strategy.sort.columns=$8 \
--hoodie-conf hoodie.clustering.plan.strategy.max.bytes.per.group=${12} \
--hoodie-conf hoodie.clustering.plan.strategy.max.num.groups=${13} \
--hoodie-conf hoodie.clustering.plan.strategy.small.file.limit=${14} \
--hoodie-conf hoodie.clustering.plan.strategy.target.file.max.bytes${15} \
--hoodie-conf hoodie.clustering.async.max.commits=${16} \
--hoodie-conf hoodie.streamer.source.dfs.root=$3 \
--hoodie-conf hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator \
--hoodie-conf hoodie.datasource.write.recordkey.field=$7 \
--hoodie-conf hoodie.datasource.write.precombine.field=$6 \
--hoodie-conf hoodie.metadata.enable=true \
--hoodie-conf hoodie.metadata.index.column.stats.enable=true \
--hoodie-conf hoodie.metadata.index.column.stats.column.list=$9 \
--hoodie-conf hoodie.bulkinsert.shuffle.parallelism=${11} \
--hoodie-conf hoodie.write.markers.type=DIRECT \
--hoodie-conf hoodie.datasource.write.partitionpath.field="" \
--hoodie-conf hoodie.streamer.schemaprovider.source.schema.file=$5 \
--hoodie-conf hoodie.streamer.schemaprovider.target.schema.file=$5 \
--op BULK_INSERT
rohitmittapalli commented 7 months ago

User error.

liangchen-datanerd commented 3 months ago

hi @rohitmittapalli how did you solve this problem. I have came across the same issue. The log shows Failed to instantiate Metadata table and FileGroup count for MDT partition files should be >0 Exception in thread "main" org.apache.hudi.exception.HoodieException: Failed to instantiate Metadata table at org.apache.hudi.client.SparkRDDWriteClient.initializeMetadataTable(SparkRDDWriteClient.java:293) at org.apache.hudi.client.SparkRDDWriteClient.initMetadataTable(SparkRDDWriteClient.java:273) at org.apache.hudi.client.BaseHoodieWriteClient.doInitTable(BaseHoodieWriteClient.java:1257) at org.apache.hudi.client.BaseHoodieWriteClient.initTable(BaseHoodieWriteClient.java:1297) at org.apache.hudi.client.SparkRDDWriteClient.bulkInsert(SparkRDDWriteClient.java:223) at org.apache.hudi.client.SparkRDDWriteClient.bulkInsert(SparkRDDWriteClient.java:217) at org.apache.hudi.utilities.streamer.StreamSync.writeToSink(StreamSync.java:923) at org.apache.hudi.utilities.streamer.StreamSync.writeToSinkAndDoMetaSync(StreamSync.java:778) at org.apache.hudi.utilities.streamer.StreamSync.syncOnce(StreamSync.java:450) at org.apache.hudi.utilities.streamer.HoodieStreamer$StreamSyncService.ingestOnce(HoodieStreamer.java:850) at org.apache.hudi.utilities.ingestion.HoodieIngestionService.startIngestion(HoodieIngestionService.java:72) at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) at org.apache.hudi.utilities.streamer.HoodieStreamer.sync(HoodieStreamer.java:207) at org.apache.hudi.utilities.streamer.HoodieStreamer.main(HoodieStreamer.java:592) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1020) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.IllegalArgumentException: FileGroup count for MDT partition files should be >0 at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:42) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.prepRecords(HoodieBackedTableMetadataWriter.java:1154) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.commitInternal(HoodieBackedTableMetadataWriter.java:1059) at org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.bulkCommit(SparkHoodieBackedTableMetadataWriter.java:130) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.initializeFromFilesystem(HoodieBackedTableMetadataWriter.java:438) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.initializeIfNeeded(HoodieBackedTableMetadataWriter.java:271) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.<init>(HoodieBackedTableMetadataWriter.java:175) at org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.<init>(SparkHoodieBackedTableMetadataWriter.java:95) at org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.create(SparkHoodieBackedTableMetadataWriter.java:72) at org.apache.hudi.client.SparkRDDWriteClient.initializeMetadataTable(SparkRDDWriteClient.java:287) ... 25 more