Open LmrZER0 opened 1 month ago
Do you have multiple jobs here? For lazy cleaning, only one cleaning is allowed now because the cleaning is not guarded by any lock currently, that means you can only enable cleaning for a singleton job.
@LmrZER0 Also, can you provide your full writer configurations?
@LmrZER0 Will you be able to provide us required info to look into this further? Please let us know in case it got resolved.
do you have spark speculation enabled by any chance?
Even if the marker exists, we can still take the rollback, this might be an possible improvement.
Tips before filing an issue
Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
If you have triaged this as a bug, then file an issue directly.
Describe the problem you faced
A clear and concise description of the problem.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
A clear and concise description of what you expected to happen.
2024-08-13 11:06:01.598 ERROR [pool-258-thread-1:8-thread-1] org.apache.hudi.async.HoodieAsyncService - Service shutdown with error java.util.concurrent.ExecutionException: org.apache.hudi.exception.HoodieRollbackException: Failed to rollback hdfs://ns1200/user/test/tmp.db/app_jdr_ads_dra_edm_user_behavior_content_hudi_a_d_d commits 20240811184332421 at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1895) at org.apache.hudi.async.HoodieAsyncService.waitForShutdown(HoodieAsyncService.java:103) at org.apache.hudi.async.AsyncCleanerService.waitForCompletion(AsyncCleanerService.java:75) at org.apache.hudi.client.BaseHoodieTableServiceClient.asyncClean(BaseHoodieTableServiceClient.java:132) at org.apache.hudi.client.HoodieFlinkWriteClient.waitForCleaningFinish(HoodieFlinkWriteClient.java:344) at org.apache.hudi.sink.CleanFunction.lambda$notifyCheckpointComplete$1(CleanFunction.java:84) at org.apache.hudi.sink.utils.NonThrownExecutor.lambda$wrapAction$0(NonThrownExecutor.java:130) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: org.apache.hudi.exception.HoodieRollbackException: Failed to rollback hdfs://ns1200/user/test/tmp.db/app_jdr_ads_dra_edm_user_behavior_content_hudi_a_d_d commits 20240811184332421 at org.apache.hudi.client.BaseHoodieTableServiceClient.rollback(BaseHoodieTableServiceClient.java:1061) at org.apache.hudi.client.BaseHoodieTableServiceClient.rollback(BaseHoodieTableServiceClient.java:1008) at org.apache.hudi.client.BaseHoodieTableServiceClient.rollbackFailedWrites(BaseHoodieTableServiceClient.java:935) at org.apache.hudi.client.BaseHoodieTableServiceClient.rollbackFailedWrites(BaseHoodieTableServiceClient.java:917) at org.apache.hudi.client.BaseHoodieTableServiceClient.rollbackFailedWrites(BaseHoodieTableServiceClient.java:912) at org.apache.hudi.client.BaseHoodieTableServiceClient.lambda$clean$1cda88ee$1(BaseHoodieTableServiceClient.java:739) at org.apache.hudi.common.util.CleanerUtils.rollbackFailedWrites(CleanerUtils.java:214) at org.apache.hudi.client.BaseHoodieTableServiceClient.clean(BaseHoodieTableServiceClient.java:738) at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:843) at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:816) at org.apache.hudi.async.AsyncCleanerService.lambda$startService$0(AsyncCleanerService.java:55) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) ... 3 common frames omitted Caused by: org.apache.hudi.exception.HoodieException: Error occurs when executing flatMap at org.apache.hudi.common.function.FunctionWrapper.lambda$throwingFlatMapWrapper$1(FunctionWrapper.java:50) at java.util.stream.ReferencePipeline$7$1.accept(ReferencePipeline.java:267) at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1382) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:747) at java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:721) at java.util.stream.AbstractTask.compute(AbstractTask.java:316) at java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:731) at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) at java.util.concurrent.ForkJoinTask.doInvoke(ForkJoinTask.java:401) at java.util.concurrent.ForkJoinTask.invoke(ForkJoinTask.java:734) at java.util.stream.ReduceOps$ReduceOp.evaluateParallel(ReduceOps.java:714) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:233) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.apache.hudi.client.common.HoodieFlinkEngineContext.flatMap(HoodieFlinkEngineContext.java:141) at org.apache.hudi.table.action.rollback.BaseRollbackHelper.maybeDeleteAndCollectStats(BaseRollbackHelper.java:150) at org.apache.hudi.table.action.rollback.BaseRollbackHelper.performRollback(BaseRollbackHelper.java:115) at org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.executeRollback(BaseRollbackActionExecutor.java:245) at org.apache.hudi.table.action.rollback.MergeOnReadRollbackActionExecutor.executeRollback(MergeOnReadRollbackActionExecutor.java:87) at org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.doRollbackAndGetStats(BaseRollbackActionExecutor.java:227) at org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.runRollback(BaseRollbackActionExecutor.java:111) at org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.execute(BaseRollbackActionExecutor.java:141) at org.apache.hudi.table.HoodieFlinkMergeOnReadTable.rollback(HoodieFlinkMergeOnReadTable.java:158) at org.apache.hudi.client.BaseHoodieTableServiceClient.rollback(BaseHoodieTableServiceClient.java:1044) ... 14 common frames omitted Caused by: org.apache.hudi.exception.HoodieException: Failed to create marker file hdfs://ns1007/user/test/tmp.db/app_jdr_ads_dra_edm_user_behavior_content_hudi_a_d_d/.hoodie/.temp/20240811185848523/dt=2024-08-11/.00000168-778b-477d-b4ab-1417e067f08e_20240811182559380.log.1_13-64-0.marker.APPEND at org.apache.hudi.table.marker.DirectWriteMarkers.create(DirectWriteMarkers.java:264) at org.apache.hudi.table.marker.DirectWriteMarkers.createWithEarlyConflictDetection(DirectWriteMarkers.java:243) at org.apache.hudi.table.marker.WriteMarkers.createIfNotExists(WriteMarkers.java:135) at org.apache.hudi.table.action.rollback.BaseRollbackHelper$1.createAppendMarker(BaseRollbackHelper.java:251) at org.apache.hudi.table.action.rollback.BaseRollbackHelper$1.preLogFileOpen(BaseRollbackHelper.java:241) at org.apache.hudi.common.table.log.HoodieLogFormatWriter.getOutputStream(HoodieLogFormatWriter.java:100) at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:149) at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlock(HoodieLogFormatWriter.java:140) at org.apache.hudi.table.action.rollback.BaseRollbackHelper.lambda$maybeDeleteAndCollectStats$b2977713$1(BaseRollbackHelper.java:181) at org.apache.hudi.common.function.FunctionWrapper.lambda$throwingFlatMapWrapper$1(FunctionWrapper.java:48) ... 38 common frames omitted Caused by: org.apache.hadoop.fs.FileAlreadyExistsException: /user/test/tmp.db/app_jdr_ads_dra_edm_user_behavior_content_hudi_a_d_d/.hoodie/.temp/20240811185848523/dt=2024-08-11/.00000168-778b-477d-b4ab-1417e067f08e_20240811182559380.log.1_13-64-0.marker.APPEND for client 10.198.21.35 already exists at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.startFile(FSDirWriteFileOp.java:463) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2874) at org.apache.hadoop.hdfs.server.namenode.JDFSNamesystem.access$401(JDFSNamesystem.java:177) at org.apache.hadoop.hdfs.server.namenode.JDFSNamesystem$5.call(JDFSNamesystem.java:1494) at org.apache.hadoop.hdfs.server.namenode.JDFSNamesystem$5.call(JDFSNamesystem.java:1484) at org.apache.hadoop.hdfs.server.namenode.JDFSNamesystem$CoalesceWriteThread.run(JDFSNamesystem.java:1647)
Environment Description
Hudi version : 0.10.0
Spark version no
Hive version :no
Hadoop version :
Storage (HDFS/S3/GCS..) :hdfs
Running on Docker? (yes/no) :no
Flink version: 1.14
Additional context
Add any other context about the problem here.
Stacktrace
Add the stacktrace of the error.