Open arunvasudevan opened 1 year ago
@arunvasudevan Thanks for raising this. If I have understood it clearly then your issue is cleaner is not cleaning the file occasionally. You mentioned that you dont see that file in any commit timeline. Did you checked archived commits too?
Can you also provide your writer configurations.
Yes, checked the archive folder and it is empty in this case.
Here are the writer configurtions.
hoodie.datasource.hive_sync.database:
@ad1happy2go Let me know if you need any more info.
@arunvasudevan Are you there on hudi slack? If yes, can you message me there to have a call to understand the issue more. Thanks.
@ad1happy2go Messaged you on Hudi Slack. We can connect more about this issue in slack, Thanks!
cc @yihua @jonvex
hey @ad1happy2go : any follow ups on this?
Running inline compaction and cleaning process for MoR tables on EMR - 6.12 (i.e Hudi 0.13) Facing NoSuchElementException on a FileID. The specific file is present in S3 but that file is not in any commit timeline. This is a low frequency table so this table gets an insert record once every month and our cleaner and compactor is configured to be inline. But, the cleaner does not cleanup the file causing the writer to fail. There is another file on the same path with a different fileID that is present with an updated data. To resolve this issue we deleted thsi orphaned stale file. But its really not clear why this issue is occuring.
Here is the hoodie.properties file hoodie.compaction.payload.class=org.apache.hudi.common.model.OverwriteWithLatestAvroPayload hoodie.table.type=MERGE_ON_READ hoodie.table.metadata.partitions= hoodie.table.precombine.field=source_ts_ms hoodie.table.partition.fields= hoodie.archivelog.folder=archived hoodie.timeline.layout.version=1 hoodie.table.checksum=4106800621 hoodie.datasource.write.drop.partition.columns=false hoodie.table.timeline.timezone=LOCAL hoodie.table.name=performer_ride_join_table hoodie.table.recordkey.fields=performer_id,ride_id hoodie.compaction.record.merger.strategy=eeb8d96f-b1e4-49fd-bbf8-28ac514178e5 hoodie.datasource.write.hive_style_partitioning=false hoodie.partition.metafile.use.base.format=false hoodie.table.keygenerator.class=org.apache.hudi.keygen.NonpartitionedKeyGenerator hoodie.populate.meta.fields=true hoodie.table.base.file.format=PARQUET hoodie.database.name= hoodie.datasource.write.partitionpath.urlencode=false hoodie.table.version=5
To Reproduce
Steps to reproduce the behavior: This does not happen regularly in the past 3 weeks it happen 2 times - 1 per week
Expected behavior
Cleaner should have cleaned up the orphaned file.
Environment Description
Hudi version : 0.13
Spark version : 3.4.0
Hive version : 3.1.3
Hadoop version : 3.3.3
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : No
Additional context
This is a low frequency table so this table gets an insert record once every month and our cleaner and compactor is configured to be inline. But, the cleaner does not cleanup the file causing the writer to fail. There is another file on the same path with a different fileID that is present with an updated data. To resolve this issue we deleted thsi orphaned stale file. But its really not clear why this issue is occuring.
Stacktrace
Error: 23/10/09 17:27:39 ERROR Client: Application diagnostics message: User class threw exception: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 174.0 failed 4 times, most recent failure: Lost task 0.3 in stage 174.0 (TID 30924) (ip-10-11-117-156.ec2.internal executor 28): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0 at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:336) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:251) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:905) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:905) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:377) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1552) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1462) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1526) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1349) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:375) at org.apache.spark.rdd.RDD.iterator(RDD.scala:326) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:141) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1541) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: java.util.NoSuchElementException: FileID 62b95ad5-44df-496c-8309-4d5d270f2ef6-0 of partition path does not exist. at org.apache.hudi.io.HoodieMergeHandle.getLatestBaseFile(HoodieMergeHandle.java:156)