Closed sunke38 closed 2 years ago
I hava this problem too.When I use hudi-delta streamer tool to extract data from mysql to hdfs.
22/06/06 22:15:27 ERROR Javalin: Exception occurred while servicing http-request java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics; at org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.updateInputStreamStatistics(FSDataInputStreamWrapper.java:249) at org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.close(FSDataInputStreamWrapper.java:296) at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.closeStreams(HFileBlock.java:1825) at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFilePreadReader.close(HFilePreadReader.java:107) at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.close(HFileReaderImpl.java:1421) at org.apache.hudi.io.storage.HoodieHFileReader.close(HoodieHFileReader.java:218) at org.apache.hudi.metadata.HoodieBackedTableMetadata.closeReader(HoodieBackedTableMetadata.java:574) at org.apache.hudi.metadata.HoodieBackedTableMetadata.close(HoodieBackedTableMetadata.java:567) at org.apache.hudi.metadata.HoodieBackedTableMetadata.close(HoodieBackedTableMetadata.java:554) at org.apache.hudi.metadata.HoodieMetadataFileSystemView.close(HoodieMetadataFileSystemView.java:83) at org.apache.hudi.common.table.view.FileSystemViewManager.clearFileSystemView(FileSystemViewManager.java:86) at org.apache.hudi.timeline.service.handlers.FileSliceHandler.refreshTable(FileSliceHandler.java:118) at org.apache.hudi.timeline.service.RequestHandler.lambda$registerFileSlicesAPI$19(RequestHandler.java:390) at org.apache.hudi.timeline.service.RequestHandler$ViewHandler.handle(RequestHandler.java:501) at io.javalin.security.SecurityUtil.noopAccessManager(SecurityUtil.kt:22) at io.javalin.Javalin.lambda$addHandler$0(Javalin.java:606) at io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:46) at io.javalin.core.JavalinServlet$service$2$1.invoke(JavalinServlet.kt:17) at io.javalin.core.JavalinServlet$service$1.invoke(JavalinServlet.kt:143) at io.javalin.core.JavalinServlet$service$2.invoke(JavalinServlet.kt:41) at io.javalin.core.JavalinServlet.service(JavalinServlet.kt:107) at io.javalin.core.util.JettyServerUtil$initialize$httpHandler$1.doHandle(JettyServerUtil.kt:72) at org.apache.hudi.org.apache.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203) at org.apache.hudi.org.apache.jetty.servlet.ServletHandler.doScope(ServletHandler.java:480) at org.apache.hudi.org.apache.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1668) at org.apache.hudi.org.apache.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201) at org.apache.hudi.org.apache.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1247) at org.apache.hudi.org.apache.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144) at org.apache.hudi.org.apache.jetty.server.handler.HandlerList.handle(HandlerList.java:61) at org.apache.hudi.org.apache.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:174) at org.apache.hudi.org.apache.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132) at org.apache.hudi.org.apache.jetty.server.Server.handle(Server.java:502) at org.apache.hudi.org.apache.jetty.server.HttpChannel.handle(HttpChannel.java:370) at org.apache.hudi.org.apache.jetty.server.HttpConnection.onFillable(HttpConnection.java:267) at org.apache.hudi.org.apache.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:305) at org.apache.hudi.org.apache.jetty.io.FillInterest.fillable(FillInterest.java:103) at org.apache.hudi.org.apache.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117) at org.apache.hudi.org.apache.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333) at org.apache.hudi.org.apache.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310) at org.apache.hudi.org.apache.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168) at org.apache.hudi.org.apache.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126) at org.apache.hudi.org.apache.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366) at org.apache.hudi.org.apache.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:765) at org.apache.hudi.org.apache.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:683) at java.lang.Thread.run(Thread.java:748) 22/06/06 22:15:27 ERROR HoodieDeltaStreamer: Got error running delta sync once. Shutting down org.apache.hudi.exception.HoodieRemoteException: status code: 500, reason phrase: Server Error at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.refresh(RemoteHoodieTableFileSystemView.java:420) at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.sync(RemoteHoodieTableFileSystemView.java:484) at org.apache.hudi.common.table.view.PriorityBasedFileSystemView.sync(PriorityBasedFileSystemView.java:257) at org.apache.hudi.table.HoodieSparkTable.create(HoodieSparkTable.java:92) at org.apache.hudi.table.HoodieSparkTable.create(HoodieSparkTable.java:67) at org.apache.hudi.client.SparkRDDWriteClient.createTable(SparkRDDWriteClient.java:129) at org.apache.hudi.client.BaseHoodieWriteClient.scheduleTableServiceInternal(BaseHoodieWriteClient.java:1352) at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:864) at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:837) at org.apache.hudi.client.BaseHoodieWriteClient.clean(BaseHoodieWriteClient.java:891) at org.apache.hudi.client.BaseHoodieWriteClient.autoCleanOnCommit(BaseHoodieWriteClient.java:614) at org.apache.hudi.client.BaseHoodieWriteClient.postCommit(BaseHoodieWriteClient.java:533) at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:236) at org.apache.hudi.client.SparkRDDWriteClient.commit(SparkRDDWriteClient.java:122) at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:622) at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:331) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.lambda$sync$2(HoodieDeltaStreamer.java:200) at org.apache.hudi.common.util.Option.ifPresent(Option.java:97) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:198) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:549) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1043) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1052) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: org.apache.http.client.HttpResponseException: status code: 500, reason phrase: Server Error at org.apache.http.impl.client.AbstractResponseHandler.handleResponse(AbstractResponseHandler.java:70) at org.apache.http.client.fluent.Response.handleResponse(Response.java:90) at org.apache.http.client.fluent.Response.returnContent(Response.java:97) at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.executeRequest(RemoteHoodieTableFileSystemView.java:179) at org.apache.hudi.common.table.view.RemoteHoodieTableFileSystemView.refresh(RemoteHoodieTableFileSystemView.java:418) ... 31 more
I also encountered this problem.
@sunke38 @RoderickAdriance @XuQianJin-Stars : do you happened to have any hbase jars in your class path. If not, we should not see this issue in my understanding. CC @yihua
@nsivabalan :I didn't have any HBASE dependencies in my pom here is all about Hadoop dependencies. It make me confuse that it show same error when I run insert in spark-shell. I did nothing about hbase for spark-shell. I also didnt see any jar about hbase in spark/jars
spark-shell
spark-shell --jars /home/qqq/module/hudi-0.11/packaging/hudi-spark-bundle/target/hudi-spark3.2-bundle_2.12-0.11.0.jar --packages org.apache.spark:spark-avro_2.12:3.2.1 --conf 'spark.serializer=org.apache.spark.serializer.KryoSerializer' --conf 'spark.sql.extensions=org.apache.spark.sql.hudi.HoodieSparkSessionExtension' --conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
dependencies for code:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
<exclusions>
<exclusion>
<artifactId>commons-cli</artifactId>
<groupId>commons-cli</groupId>
</exclusion>
<exclusion>
<artifactId>commons-codec</artifactId>
<groupId>commons-codec</groupId>
</exclusion>
<exclusion>
<artifactId>jakarta.activation-api</artifactId>
<groupId>jakarta.activation</groupId>
</exclusion>
<exclusion>
<artifactId>protobuf-java</artifactId>
<groupId>com.google.protobuf</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client-api</artifactId>
<version>${hadoop.version}</version>
<exclusions>
<exclusion>
<artifactId>snappy-java</artifactId>
<groupId>org.xerial.snappy</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
<exclusions>
<exclusion>
<artifactId>stax2-api</artifactId>
<groupId>org.codehaus.woodstox</groupId>
</exclusion>
<exclusion>
<artifactId>avro</artifactId>
<groupId>org.apache.avro</groupId>
</exclusion>
<exclusion>
<artifactId>snappy-java</artifactId>
<groupId>org.xerial.snappy</groupId>
</exclusion>
<exclusion>
<artifactId>audience-annotations</artifactId>
<groupId>org.apache.yetus</groupId>
</exclusion>
<exclusion>
<artifactId>commons-lang3</artifactId>
<groupId>org.apache.commons</groupId>
</exclusion>
<exclusion>
<artifactId>jackson-core-asl</artifactId>
<groupId>org.codehaus.jackson</groupId>
</exclusion>
<exclusion>
<artifactId>commons-logging</artifactId>
<groupId>commons-logging</groupId>
</exclusion>
<exclusion>
<artifactId>commons-cli</artifactId>
<groupId>commons-cli</groupId>
</exclusion>
<exclusion>
<artifactId>zookeeper</artifactId>
<groupId>org.apache.zookeeper</groupId>
</exclusion>
<exclusion>
<artifactId>commons-text</artifactId>
<groupId>org.apache.commons</groupId>
</exclusion>
<exclusion>
<artifactId>jackson-mapper-asl</artifactId>
<groupId>org.codehaus.jackson</groupId>
</exclusion>
<exclusion>
<artifactId>jaxb-api</artifactId>
<groupId>javax.xml.bind</groupId>
</exclusion>
<exclusion>
<artifactId>protobuf-java</artifactId>
<groupId>com.google.protobuf</groupId>
</exclusion>
<exclusion>
<artifactId>commons-codec</artifactId>
<groupId>commons-codec</groupId>
</exclusion>
<exclusion>
<artifactId>commons-math3</artifactId>
<groupId>org.apache.commons</groupId>
</exclusion>
<exclusion>
<artifactId>slf4j-api</artifactId>
<groupId>org.slf4j</groupId>
</exclusion>
<exclusion>
<artifactId>nimbus-jose-jwt</artifactId>
<groupId>com.nimbusds</groupId>
</exclusion>
<exclusion>
<artifactId>jakarta.activation-api</artifactId>
<groupId>jakarta.activation</groupId>
</exclusion>
<exclusion>
<artifactId>commons-io</artifactId>
<groupId>commons-io</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>${hadoop.version}</version>
<exclusions>
<exclusion>
<artifactId>commons-cli</artifactId>
<groupId>commons-cli</groupId>
</exclusion>
<exclusion>
<artifactId>commons-codec</artifactId>
<groupId>commons-codec</groupId>
</exclusion>
<exclusion>
<artifactId>protobuf-java</artifactId>
<groupId>com.google.protobuf</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
Same issue, loading data from Spark-native parquet into MOR table. 5 pieces (loads) completed succesfully, failed on 6-th. Data and structure is the same. Possibly some compaction/cleaning happens and invoke this problem?
incDF.filter("ins_flg='y'") .write .format("org.apache.hudi") .option(DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL) .option(TABLE_TYPE_OPT_KEY, "MERGE_ON_READ") .option(RECORDKEY_FIELD_OPT_KEY, "rn") .option(PRECOMBINE_FIELD_OPT_KEY, "ss_date_time") .option(HIVE_SUPPORT_TIMESTAMP_TYPE.key, "true") .option("hoodie.index.type", "SIMPLE") .option(TABLE_NAME, "store_sales") .mode(SaveMode.Append) .save("/tmp/bench_hudi/store_sales")
UPD: yes, looks like compaction/cleaner works here (I have 2 writes per cycle and hoodie.cleaner.commits.retained = 10 by default). I added following options to disable cleaner in my test cycle, but error still appears on 10-th commit. "hoodie.keep.min.commits" -> "40", "hoodie.keep.max.commits" -> "50", "hoodie.cleaner.commits.retained" -> "30", "hoodie.clean.automatic" -> "false"
@sunke38 @RoderickAdriance @XuQianJin-Stars : do you happened to have any hbase jars in your class path. If not, we should not see this issue in my understanding. CC @yihua
hi @nsivabalan The problem on my side has been solved, and it is found that it is a jar package conflict problem.
@sunke38 @RoderickAdriance @XuQianJin-Stars : do you happened to have any hbase jars in your class path. If not, we should not see this issue in my understanding. CC @yihua
hi @nsivabalan The problem on my side has been solved, and it is found that it is a jar package conflict problem.
Hi @XuQianJin-Stars May I ask for detail how you solve this,plz? I am very confuse that where is localtion of jar package conflict. Thank you
This problem will occur after when I execute the command many times
bin/spark-submit --master local[2] --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \ jars/hudi-utilities-bundle_2.12-0.11.0.jar --table-type COPY_ON_WRITE \ --props /dolphinscheduler/obeiadmin/resources/Logistics_screen/data_ingestion/mysql-cnb_order-jdbc-source.properties \ --source-class org.apache.hudi.utilities.sources.JdbcSource \ --source-ordering-field ts \ --target-base-path /user/hive/warehouse/bwdhmosaas.db/ods_cnb_order \ --target-table ods_cnb_order \ --transformer-class org.apache.hudi.utilities.transform.SqlQueryBasedTransformer
hbase rely on hadoop-hdfs-client 2.10. If use hadoop3, this error occurs, hadoop-client-api
Why is there an hfile file without using hbase index @nsivabalan
https://user-images.githubusercontent.com/1145830/174435696-e4b259b5-4ca4-4b0d-bdd1-938ab7c516df.png
@sunke38 @RoderickAdriance @XuQianJin-Stars : do you happened to have any hbase jars in your class path. If not, we should not see this issue in my understanding. CC @yihua
hi @nsivabalan The problem on my side has been solved, and it is found that it is a jar package conflict problem.
Hi @XuQianJin-Stars May I ask for detail how you solve this,plz? I am very confuse that where is localtion of jar package conflict. Thank you
@yihua @nsivabalan @XuQianJin-Stars Which version of Hudi is planned to fix this issue?
Should be resolved by https://github.com/apache/hudi/pull/5882 @Humphrey0822 Can you try out that patch?
@codope I also have the same issue, and I applied the patch mentioned above(#5882) and tested it. But same error occurs.
Spark version : 3.2.1 Hadoop version : 3.2.2 (on-premise hadoop) Storage (HDFS/S3/GCS..) : HDFS Running on Docker? (yes/no) : no
@melin HFile is used as the base file format in metadata table under <base_path>/.hoodie/metadata
. The metadata table is a MOR table and the HFile only appears after compaction in the metadata table.
It looks like the current way of packaging HFile classes is not compatible with Hadoop 3.
I created a ticket to track the fix: HUDI-4341.
@yihua I use Hadoop3 and spark2 this problem will be resolved. So I think this problem is caused by the incompatibility between Hudi jar package and spark3 package.
This problem is caused by that the hbase 2.4.9 jars in maven source are compiled with hadoop-2.7. A quick fix is to compile hbase with hadoop 3.* , mvn install it, and then compile hudi.
I have met same problen, anyone can explain this ?
I resolved this by my own, by packaging a new version of hbase 2.4.9 with our Hadoop 3 version with the following command:
mvn clean install -Denforcer.skip -DskipTests -Dhadoop.profile=3.0 -Psite-install-step
then, changed hbase.defaults.for.version
in hudi-common/src/main/resources/hbase-site.xml
after that, changed hbase.version
in pom.xml of Hudi, used versions-maven-plugin
to create a new Hudi version, and package Hudi again.
Ran into the same issue when using deltastreamer to read from Kafka and write to HDFS.
Spark: 3.2.1 Hadoop: 3.2.2
Caused by: org.apache.hudi.exception.HoodieUpsertException: Failed to upsert for commit time 20220723125828383 at org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:64) at org.apache.hudi.table.action.deltacommit.SparkUpsertDeltaCommitActionExecutor.execute(SparkUpsertDeltaCommitActionExecutor.java:46) at org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsert(HoodieSparkMergeOnReadTable.java:89) at org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsert(HoodieSparkMergeOnReadTable.java:76) at org.apache.hudi.client.SparkRDDWriteClient.upsert(SparkRDDWriteClient.java:155) at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:586) at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:333) at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.lambda$startService$0(HoodieDeltaStreamer.java:679) at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()Lorg/apache/hadoop/hdfs/DFSInputStream$ReadStatistics; at org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.updateInputStreamStatistics(FSDataInputStreamWrapper.java:249) at org.apache.hudi.org.apache.hadoop.hbase.io.FSDataInputStreamWrapper.close(FSDataInputStreamWrapper.java:296) at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileBlock$FSReaderImpl.closeStreams(HFileBlock.java:1825) at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFilePreadReader.close(HFilePreadReader.java:107) at org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFileReaderImpl.close(HFileReaderImpl.java:1421) at org.apache.hudi.io.storage.HoodieHFileReader.close(HoodieHFileReader.java:218) at org.apache.hudi.metadata.HoodieBackedTableMetadata.closeReader(HoodieBackedTableMetadata.java:588) at org.apache.hudi.metadata.HoodieBackedTableMetadata.close(HoodieBackedTableMetadata.java:571) at org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getRecordsByKeys$0(HoodieBackedTableMetadata.java:232) at java.util.HashMap.forEach(HashMap.java:1289) at org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordsByKeys(HoodieBackedTableMetadata.java:207) at org.apache.hudi.metadata.HoodieBackedTableMetadata.getRecordByKey(HoodieBackedTableMetadata.java:140) at org.apache.hudi.metadata.BaseTableMetadata.fetchAllPartitionPaths(BaseTableMetadata.java:281) at org.apache.hudi.metadata.BaseTableMetadata.getAllPartitionPaths(BaseTableMetadata.java:111) at org.apache.hudi.common.fs.FSUtils.getAllPartitionPaths(FSUtils.java:313) at org.apache.hudi.index.bloom.HoodieGlobalBloomIndex.loadColumnRangesFromFiles(HoodieGlobalBloomIndex.java:62) at org.apache.hudi.index.bloom.HoodieBloomIndex.getBloomIndexFileInfoForPartitions(HoodieBloomIndex.java:151) at org.apache.hudi.index.bloom.HoodieBloomIndex.lookupIndex(HoodieBloomIndex.java:125) at org.apache.hudi.index.bloom.HoodieBloomIndex.tagLocation(HoodieBloomIndex.java:91) at org.apache.hudi.table.action.commit.HoodieWriteHelper.tag(HoodieWriteHelper.java:49) at org.apache.hudi.table.action.commit.HoodieWriteHelper.tag(HoodieWriteHelper.java:32) at org.apache.hudi.table.action.commit.BaseWriteHelper.write(BaseWriteHelper.java:53) ... 11 more
We also encountered the same problem with hudi-0.11.1 & spark-3.2.1,and our current temporary method is set hoodie.metadata.enable=false.
@yihua : do you think we can document the solution proposed by @dohongdayi above in some FAQ.
I resolved this by my own, by packaging a new version of hbase 2.4.9 with our Hadoop 3 version with the following command:
mvn clean install -Denforcer.skip -DskipTests -Dhadoop.profile=3.0 -Psite-install-step
then, changed hbase.defaults.for.version in hudi-common/src/main/resources/hbase-site.xml
after that, changed hbase.version in pom.xml of Hudi, used versions-maven-plugin to create a new Hudi version, and package Hudi again.
I created a ticket to track the fix: HUDI-4341.
Version 0.12 has been released. Is this bug fixed?
Version 0.12 has been released. Is this bug fixed?
Version 0.12 has been released. Is this bug fixed?
in spark-3.2.2,hudi-0.12.0, it not fixed
I created a ticket to track the fix: HUDI-4341.
Will the next version consider fixing this problem? which bothers newbies like me.
@nsivabalan We can document the workaround, yet it's still not ideal for users relying on releases. I'll check if we can fix it by the dependency management within Hudi.
yes, I get it. even if we put in a fix, only new versions of hudi might support. not the existing ones. So, better to document it.
Closing the issue as the workaround has been suggested. feel free to re-open or create new one. thanks!
The workaround to mitigate this problem is captured in the FAQ: #6756.
@nsivabalan Hello, why is this possible? After compiling, I found that the method reference of HdfsDataInputStream.getReadStatistics has not changed
Check your steps carefully recompile the Habse 2.4.
modified the pom file update hadoop version 2.7 to 3.2 or higher verion.
Aldrich @.***
------------------ 原始邮件 ------------------ 发件人: "chao @.>; 发送时间: 2023年2月23日(星期四) 下午2:12 收件人: @.>; 抄送: @.>; @.>; 主题: Re: [apache/hudi] [SUPPORT] throw "java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()" (Issue #5765)
@nsivabalan Hello, why is this possible? After compiling, I found that the method reference of HdfsDataInputStream.getReadStatistics has not changed
— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.***>
i found that problem throws when hudi use the hadoop version which is 3.3.1 in my env.
Same issue, loading data from Spark-native parquet into MOR table. 5 pieces (loads) completed succesfully, failed on 6-th. Data and structure is the same. Possibly some compaction/cleaning happens and invoke this problem?
incDF.filter("ins_flg='y'") .write .format("org.apache.hudi") .option(DataSourceWriteOptions.OPERATION_OPT_KEY, DataSourceWriteOptions.BULK_INSERT_OPERATION_OPT_VAL) .option(TABLE_TYPE_OPT_KEY, "MERGE_ON_READ") .option(RECORDKEY_FIELD_OPT_KEY, "rn") .option(PRECOMBINE_FIELD_OPT_KEY, "ss_date_time") .option(HIVE_SUPPORT_TIMESTAMP_TYPE.key, "true") .option("hoodie.index.type", "SIMPLE") .option(TABLE_NAME, "store_sales") .mode(SaveMode.Append) .save("/tmp/bench_hudi/store_sales")
UPD: yes, looks like compaction/cleaner works here (I have 2 writes per cycle and hoodie.cleaner.commits.retained = 10 by default). I added following options to disable cleaner in my test cycle, but error still appears on 10-th commit. "hoodie.keep.min.commits" -> "40", "hoodie.keep.max.commits" -> "50", "hoodie.cleaner.commits.retained" -> "30", "hoodie.clean.automatic" -> "false"
Hi I encounter the same issue. Do you fix this?
### I solve it by using spark 3.4.0 and Hadoop 3.2.4 but I have a new error now XD, 23/12/24 12:24:42 WARN DFSPropertiesConfiguration: Cannot find HUDI_CONF_DIR, please set it as the dir of hudi-defaults.conf 23/12/24 12:24:42 WARN DFSPropertiesConfiguration: Properties file file:/etc/hudi/conf/hudi-defaults.conf not found. Ignoring to load props file 23/12/24 12:24:42 WARN HoodieSparkSqlWriter$: Choosing BULK_INSERT as the operation type since auto record key generation is applicable 23/12/24 12:24:49 WARN AutoRecordKeyGenerationUtils$: Precombine field ts will be ignored with auto record key generation enabled 23/12/24 12:24:55 WARN MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-hbase.properties,hadoop-metrics2.properties 23/12/24 12:25:01 WARN WriteMarkersFactory: Timeline-server-based markers are not supported for HDFS: base path hdfs://localhost:9000/hudi/data/test5. Falling back to direct markers. 23/12/24 12:25:01 WARN WriteMarkersFactory: Timeline-server-based markers are not supported for HDFS: base path hdfs://localhost:9000/hudi/data/test5. Falling back to direct markers. 23/12/24 12:25:01 WARN WriteMarkersFactory: Timeline-server-based markers are not supported for HDFS: base path hdfs://localhost:9000/hudi/data/test5. Falling back to direct markers. 23/12/24 12:25:01 WARN WriteMarkersFactory: Timeline-server-based markers are not supported for HDFS: base path hdfs://localhost:9000/hudi/data/test5. Falling back to direct markers. 23/12/24 12:25:02 WARN WriteMarkersFactory: Timeline-server-based markers are not supported for HDFS: base path hdfs://localhost:9000/hudi/data/test5. Falling back to direct markers. 23/12/24 12:25:03 WARN WriteMarkersFactory: Timeline-server-based markers are not supported for HDFS: base path hdfs://localhost:9000/hudi/data/test5. Falling back to direct markers.
23/12/24 12:25:06 WARN DataStreamer: DataStreamer Exception java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[127.0.0.1:9866,DS-56d60cd9-579e-4d47-942a-1d1cd47448c9,DISK]], original=[DatanodeInfoWithStorage[127.0.0.1:9866,DS-56d60cd9-579e-4d47-942a-1d1cd47448c9,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1352) at org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1420) at org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1646) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1547) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1529) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:717) 23/12/24 12:25:06 WARN DFSClient: Error while syncing java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[127.0.0.1:9866,DS-56d60cd9-579e-4d47-942a-1d1cd47448c9,DISK]], original=[DatanodeInfoWithStorage[127.0.0.1:9866,DS-56d60cd9-579e-4d47-942a-1d1cd47448c9,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1352) at org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1420) at org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1646) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1547) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1529) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:717) 23/12/24 12:25:06 ERROR BaseSparkCommitActionExecutor: Error upserting bucketType UPDATE for partition :0 org.apache.hudi.exception.HoodieAppendException: Failed while appending records to hdfs://localhost:9000/hudi/data/test5/.hoodie/metadata/files/.files-0000-0_00000000000000010.log.1_0-0-0 at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:487) at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:450) at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:83) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:335) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:257) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:905) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:905) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:377) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1552) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1462) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1526) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1349) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:375) at org.apache.spark.rdd.RDD.iterator(RDD.scala:326) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:139) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[127.0.0.1:9866,DS-56d60cd9-579e-4d47-942a-1d1cd47448c9,DISK]], original=[DatanodeInfoWithStorage[127.0.0.1:9866,DS-56d60cd9-579e-4d47-942a-1d1cd47448c9,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1352) at org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1420) at org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1646) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1547) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1529) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:717) 23/12/24 12:25:06 WARN BlockManager: Putting block rdd_48_0 failed due to exception org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0. 23/12/24 12:25:06 WARN BlockManager: Block rdd_48_0 could not be removed as it was not found on disk or in memory 23/12/24 12:25:06 ERROR Executor: Exception in task 0.0 in stage 15.0 (TID 19) org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0 at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:342) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:257) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:905) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:905) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:377) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1552) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1462) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1526) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1349) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:375) at org.apache.spark.rdd.RDD.iterator(RDD.scala:326) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:139) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while appending records to hdfs://localhost:9000/hudi/data/test5/.hoodie/metadata/files/.files-0000-0_00000000000000010.log.1_0-0-0 at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:487) at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:450) at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:83) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:335) ... 29 more Caused by: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[127.0.0.1:9866,DS-56d60cd9-579e-4d47-942a-1d1cd47448c9,DISK]], original=[DatanodeInfoWithStorage[127.0.0.1:9866,DS-56d60cd9-579e-4d47-942a-1d1cd47448c9,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1352) at org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1420) at org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1646) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1547) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1529) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:717) 23/12/24 12:25:06 WARN TaskSetManager: Lost task 0.0 in stage 15.0 (TID 19) (192.168.1.171 executor driver): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0 at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:342) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:257) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:905) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:905) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:377) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1552) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1462) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1526) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1349) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:375) at org.apache.spark.rdd.RDD.iterator(RDD.scala:326) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:139) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while appending records to hdfs://localhost:9000/hudi/data/test5/.hoodie/metadata/files/.files-0000-0_00000000000000010.log.1_0-0-0 at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:487) at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:450) at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:83) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:335) ... 29 more Caused by: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[127.0.0.1:9866,DS-56d60cd9-579e-4d47-942a-1d1cd47448c9,DISK]], original=[DatanodeInfoWithStorage[127.0.0.1:9866,DS-56d60cd9-579e-4d47-942a-1d1cd47448c9,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1352) at org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1420) at org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1646) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1547) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1529) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:717)
23/12/24 12:25:06 ERROR TaskSetManager: Task 0 in stage 15.0 failed 1 times; aborting job 23/12/24 12:25:06 ERROR AppendDataExec: Data source write support org.apache.hudi.spark3.internal.HoodieDataSourceInternalBatchWrite@465cf872 is aborting. 23/12/24 12:25:06 ERROR DataSourceInternalWriterHelper: Commit 20231224122443495 aborted 23/12/24 12:25:09 WARN HoodieLogFormatWriter: Remote Exception, attempting to handle or recover lease org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to APPEND_FILE /hudi/data/test5/.hoodie/metadata/files/.files-0000-0_00000000000000010.log.1_0-0-0 for DFSClient_NONMAPREDUCE_1489527491_15 on 127.0.0.1 because DFSClient_NONMAPREDUCE_1489527491_15 is already the current lease holder. at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.recoverLeaseInternal(FSNamesystem.java:2669) at org.apache.hadoop.hdfs.server.namenode.FSDirAppendOp.appendFile(FSDirAppendOp.java:124) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.appendFile(FSNamesystem.java:2753) at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.append(NameNodeRpcServer.java:840) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.append(ClientNamenodeProtocolServerSideTranslatorPB.java:503) at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:549) at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:518) at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1086) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1035) at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:963) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1762) at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2960)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1612)
at org.apache.hadoop.ipc.Client.call(Client.java:1558)
at org.apache.hadoop.ipc.Client.call(Client.java:1455)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:242)
at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:129)
at com.sun.proxy.$Proxy39.append(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.append(ClientNamenodeProtocolTranslatorPB.java:413)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy40.append(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1382)
at org.apache.hadoop.hdfs.DFSClient.callAppend(DFSClient.java:1404)
at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1473)
at org.apache.hadoop.hdfs.DFSClient.append(DFSClient.java:1443)
at org.apache.hadoop.hdfs.DistributedFileSystem$5.doCall(DistributedFileSystem.java:446)
at org.apache.hadoop.hdfs.DistributedFileSystem$5.doCall(DistributedFileSystem.java:442)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:454)
at org.apache.hadoop.hdfs.DistributedFileSystem.append(DistributedFileSystem.java:423)
at org.apache.hadoop.fs.FileSystem.append(FileSystem.java:1470)
at org.apache.hudi.common.fs.HoodieWrapperFileSystem.append(HoodieWrapperFileSystem.java:523)
at org.apache.hudi.common.table.log.HoodieLogFormatWriter.getOutputStream(HoodieLogFormatWriter.java:101)
at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlocks(HoodieLogFormatWriter.java:144)
at org.apache.hudi.common.table.log.HoodieLogFormatWriter.appendBlock(HoodieLogFormatWriter.java:135)
at org.apache.hudi.table.action.rollback.BaseRollbackHelper.lambda$maybeDeleteAndCollectStats$309309f3$1(BaseRollbackHelper.java:141)
at org.apache.hudi.client.common.HoodieSparkEngineContext.lambda$flatMap$7d470b86$1(HoodieSparkEngineContext.java:150)
at org.apache.spark.api.java.JavaRDDLike.$anonfun$flatMap$1(JavaRDDLike.scala:125)
at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:486)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:492)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
at scala.collection.TraversableOnce.to(TraversableOnce.scala:366)
at scala.collection.TraversableOnce.to$(TraversableOnce.scala:364)
at scala.collection.AbstractIterator.to(Iterator.scala:1431)
at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:358)
at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:358)
at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1431)
at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:345)
at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:339)
at scala.collection.AbstractIterator.toArray(Iterator.scala:1431)
at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1019)
at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2303)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
at org.apache.spark.scheduler.Task.run(Task.scala:139)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
23/12/24 12:25:09 WARN HoodieLogFormatWriter: Another task executor writing to the same log file(HoodieLogFile{pathStr='hdfs://localhost:9000/hudi/data/test5/.hoodie/metadata/files/.files-0000-0_00000000000000010.log.1_0-0-0', fileLen=0}. Rolling over 23/12/24 12:25:11 WARN DataStreamer: DataStreamer Exception java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[127.0.0.1:9866,DS-56d60cd9-579e-4d47-942a-1d1cd47448c9,DISK]], original=[DatanodeInfoWithStorage[127.0.0.1:9866,DS-56d60cd9-579e-4d47-942a-1d1cd47448c9,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1352) at org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1420) at org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1646) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1547) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1529) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:717) 23/12/24 12:25:11 WARN DFSClient: Error while syncing java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[127.0.0.1:9866,DS-56d60cd9-579e-4d47-942a-1d1cd47448c9,DISK]], original=[DatanodeInfoWithStorage[127.0.0.1:9866,DS-56d60cd9-579e-4d47-942a-1d1cd47448c9,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1352) at org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1420) at org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1646) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1547) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1529) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:717) 23/12/24 12:25:11 ERROR BaseSparkCommitActionExecutor: Error upserting bucketType UPDATE for partition :0 org.apache.hudi.exception.HoodieAppendException: Failed while appending records to hdfs://localhost:9000/hudi/data/test5/.hoodie/metadata/files/.files-0000-0_00000000000000010.log.2_1-0-1 at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:487) at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:450) at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:83) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:335) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:257) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:905) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:905) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:377) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1552) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1462) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1526) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1349) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:375) at org.apache.spark.rdd.RDD.iterator(RDD.scala:326) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:139) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[127.0.0.1:9866,DS-56d60cd9-579e-4d47-942a-1d1cd47448c9,DISK]], original=[DatanodeInfoWithStorage[127.0.0.1:9866,DS-56d60cd9-579e-4d47-942a-1d1cd47448c9,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1352) at org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1420) at org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1646) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1547) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1529) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:717) 23/12/24 12:25:11 WARN BlockManager: Putting block rdd_85_0 failed due to exception org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0. 23/12/24 12:25:11 WARN BlockManager: Block rdd_85_0 could not be removed as it was not found on disk or in memory 23/12/24 12:25:11 ERROR Executor: Exception in task 0.0 in stage 31.0 (TID 57) org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0 at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:342) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:257) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:905) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:905) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:377) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1552) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1462) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1526) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1349) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:375) at org.apache.spark.rdd.RDD.iterator(RDD.scala:326) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:139) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while appending records to hdfs://localhost:9000/hudi/data/test5/.hoodie/metadata/files/.files-0000-0_00000000000000010.log.2_1-0-1 at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:487) at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:450) at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:83) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:335) ... 29 more Caused by: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[127.0.0.1:9866,DS-56d60cd9-579e-4d47-942a-1d1cd47448c9,DISK]], original=[DatanodeInfoWithStorage[127.0.0.1:9866,DS-56d60cd9-579e-4d47-942a-1d1cd47448c9,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1352) at org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1420) at org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1646) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1547) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1529) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:717) 23/12/24 12:25:11 WARN TaskSetManager: Lost task 0.0 in stage 31.0 (TID 57) (192.168.1.171 executor driver): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0 at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:342) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:257) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:905) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:905) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:377) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1552) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1462) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1526) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1349) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:375) at org.apache.spark.rdd.RDD.iterator(RDD.scala:326) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:139) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while appending records to hdfs://localhost:9000/hudi/data/test5/.hoodie/metadata/files/.files-0000-0_00000000000000010.log.2_1-0-1 at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:487) at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:450) at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:83) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:335) ... 29 more Caused by: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[127.0.0.1:9866,DS-56d60cd9-579e-4d47-942a-1d1cd47448c9,DISK]], original=[DatanodeInfoWithStorage[127.0.0.1:9866,DS-56d60cd9-579e-4d47-942a-1d1cd47448c9,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1352) at org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1420) at org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1646) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1547) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1529) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:717)
23/12/24 12:25:11 ERROR TaskSetManager: Task 0 in stage 31.0 failed 1 times; aborting job
23/12/24 12:25:11 ERROR AppendDataExec: Data source write support org.apache.hudi.spark3.internal.HoodieDataSourceInternalBatchWrite@465cf872 failed to abort.
Traceback (most recent call last):
File "
Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2785) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2721) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2720) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2720) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1206) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1206) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1206) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2984) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2923) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2912) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:971) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2263) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2284) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2303) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2328) at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1019) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:405) at org.apache.spark.rdd.RDD.collect(RDD.scala:1018) at org.apache.spark.api.java.JavaRDDLike.collect(JavaRDDLike.scala:362) at org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:361) at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45) at org.apache.hudi.data.HoodieJavaRDD.collectAsList(HoodieJavaRDD.java:177) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.setCommitMetadata(BaseSparkCommitActionExecutor.java:289) at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.autoCommit(BaseCommitActionExecutor.java:197) at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.commitOnAutoCommit(BaseCommitActionExecutor.java:183) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.updateIndexAndCommitIfNeeded(BaseSparkCommitActionExecutor.java:279) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:184) at org.apache.hudi.table.action.deltacommit.SparkUpsertPreppedDeltaCommitActionExecutor.execute(SparkUpsertPreppedDeltaCommitActionExecutor.java:44) at org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsertPrepped(HoodieSparkMergeOnReadTable.java:126) at org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsertPrepped(HoodieSparkMergeOnReadTable.java:88) at org.apache.hudi.client.SparkRDDWriteClient.upsertPreppedRecords(SparkRDDWriteClient.java:156) at org.apache.hudi.client.SparkRDDWriteClient.upsertPreppedRecords(SparkRDDWriteClient.java:63) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.commitInternal(HoodieBackedTableMetadataWriter.java:1132) at org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.commit(SparkHoodieBackedTableMetadataWriter.java:117) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.processAndCommit(HoodieBackedTableMetadataWriter.java:855) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:1032) at org.apache.hudi.table.action.BaseActionExecutor.writeTableMetadata(BaseActionExecutor.java:105) ... 89 more Caused by: org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0 at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:342) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:257) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:905) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:905) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:377) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1552) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1462) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1526) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1349) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:375) at org.apache.spark.rdd.RDD.iterator(RDD.scala:326) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:139) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while appending records to hdfs://localhost:9000/hudi/data/test5/.hoodie/metadata/files/.files-0000-0_00000000000000010.log.2_1-0-1 at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:487) at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:450) at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:83) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:335) ... 29 more Caused by: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[127.0.0.1:9866,DS-56d60cd9-579e-4d47-942a-1d1cd47448c9,DISK]], original=[DatanodeInfoWithStorage[127.0.0.1:9866,DS-56d60cd9-579e-4d47-942a-1d1cd47448c9,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1352) at org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1420) at org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1646) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1547) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1529) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:717) Caused by: org.apache.hudi.exception.HoodieException: Failed to update metadata at org.apache.hudi.client.BaseHoodieWriteClient.writeTableMetadata(BaseHoodieWriteClient.java:367) at org.apache.hudi.client.BaseHoodieWriteClient.commit(BaseHoodieWriteClient.java:285) at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:236) at org.apache.hudi.client.BaseHoodieWriteClient.commitStats(BaseHoodieWriteClient.java:211) at org.apache.hudi.internal.DataSourceInternalWriterHelper.commit(DataSourceInternalWriterHelper.java:89) ... 81 more Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 15.0 failed 1 times, most recent failure: Lost task 0.0 in stage 15.0 (TID 19) (192.168.1.171 executor driver): org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0 at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:342) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:257) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:905) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:905) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:377) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1552) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1462) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1526) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1349) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:375) at org.apache.spark.rdd.RDD.iterator(RDD.scala:326) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:139) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:750) Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while appending records to hdfs://localhost:9000/hudi/data/test5/.hoodie/metadata/files/.files-0000-0_00000000000000010.log.1_0-0-0 at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:487) at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:450) at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:83) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:335) ... 29 more Caused by: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[127.0.0.1:9866,DS-56d60cd9-579e-4d47-942a-1d1cd47448c9,DISK]], original=[DatanodeInfoWithStorage[127.0.0.1:9866,DS-56d60cd9-579e-4d47-942a-1d1cd47448c9,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1352) at org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1420) at org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1646) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1547) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1529) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:717)
Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2785) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2721) at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2720) at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2720) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1206) at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1206) at scala.Option.foreach(Option.scala:407) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1206) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2984) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2923) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2912) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:971) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2263) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2284) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2303) at org.apache.spark.SparkContext.runJob(SparkContext.scala:2328) at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1019) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:405) at org.apache.spark.rdd.RDD.collect(RDD.scala:1018) at org.apache.spark.api.java.JavaRDDLike.collect(JavaRDDLike.scala:362) at org.apache.spark.api.java.JavaRDDLike.collect$(JavaRDDLike.scala:361) at org.apache.spark.api.java.AbstractJavaRDDLike.collect(JavaRDDLike.scala:45) at org.apache.hudi.data.HoodieJavaRDD.collectAsList(HoodieJavaRDD.java:177) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.setCommitMetadata(BaseSparkCommitActionExecutor.java:289) at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.autoCommit(BaseCommitActionExecutor.java:197) at org.apache.hudi.table.action.commit.BaseCommitActionExecutor.commitOnAutoCommit(BaseCommitActionExecutor.java:183) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.updateIndexAndCommitIfNeeded(BaseSparkCommitActionExecutor.java:279) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.execute(BaseSparkCommitActionExecutor.java:184) at org.apache.hudi.table.action.deltacommit.SparkUpsertPreppedDeltaCommitActionExecutor.execute(SparkUpsertPreppedDeltaCommitActionExecutor.java:44) at org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsertPrepped(HoodieSparkMergeOnReadTable.java:126) at org.apache.hudi.table.HoodieSparkMergeOnReadTable.upsertPrepped(HoodieSparkMergeOnReadTable.java:88) at org.apache.hudi.client.SparkRDDWriteClient.upsertPreppedRecords(SparkRDDWriteClient.java:156) at org.apache.hudi.client.SparkRDDWriteClient.upsertPreppedRecords(SparkRDDWriteClient.java:63) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.commitInternal(HoodieBackedTableMetadataWriter.java:1132) at org.apache.hudi.metadata.SparkHoodieBackedTableMetadataWriter.commit(SparkHoodieBackedTableMetadataWriter.java:117) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.processAndCommit(HoodieBackedTableMetadataWriter.java:855) at org.apache.hudi.metadata.HoodieBackedTableMetadataWriter.update(HoodieBackedTableMetadataWriter.java:910) at org.apache.hudi.client.BaseHoodieWriteClient.writeTableMetadata(BaseHoodieWriteClient.java:362) ... 85 more Caused by: org.apache.hudi.exception.HoodieUpsertException: Error upserting bucketType UPDATE for partition :0 at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:342) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.lambda$mapPartitionsAsRDD$a3ab3c4$1(BaseSparkCommitActionExecutor.java:257) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1(JavaRDDLike.scala:102) at org.apache.spark.api.java.JavaRDDLike.$anonfun$mapPartitionsWithIndex$1$adapted(JavaRDDLike.scala:102) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2(RDD.scala:905) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsWithIndex$2$adapted(RDD.scala:905) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.$anonfun$getOrCompute$1(RDD.scala:377) at org.apache.spark.storage.BlockManager.$anonfun$doPutIterator$1(BlockManager.scala:1552) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$doPut(BlockManager.scala:1462) at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1526) at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:1349) at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:375) at org.apache.spark.rdd.RDD.iterator(RDD.scala:326) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:139) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ... 1 more Caused by: org.apache.hudi.exception.HoodieAppendException: Failed while appending records to hdfs://localhost:9000/hudi/data/test5/.hoodie/metadata/files/.files-0000-0_00000000000000010.log.1_0-0-0 at org.apache.hudi.io.HoodieAppendHandle.appendDataAndDeleteBlocks(HoodieAppendHandle.java:487) at org.apache.hudi.io.HoodieAppendHandle.doAppend(HoodieAppendHandle.java:450) at org.apache.hudi.table.action.deltacommit.BaseSparkDeltaCommitActionExecutor.handleUpdate(BaseSparkDeltaCommitActionExecutor.java:83) at org.apache.hudi.table.action.commit.BaseSparkCommitActionExecutor.handleUpsertPartition(BaseSparkCommitActionExecutor.java:335) ... 29 more Caused by: java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[127.0.0.1:9866,DS-56d60cd9-579e-4d47-942a-1d1cd47448c9,DISK]], original=[DatanodeInfoWithStorage[127.0.0.1:9866,DS-56d60cd9-579e-4d47-942a-1d1cd47448c9,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DataStreamer.findNewDatanode(DataStreamer.java:1352) at org.apache.hadoop.hdfs.DataStreamer.addDatanode2ExistingPipeline(DataStreamer.java:1420) at org.apache.hadoop.hdfs.DataStreamer.handleDatanodeReplacement(DataStreamer.java:1646) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineInternal(DataStreamer.java:1547) at org.apache.hadoop.hdfs.DataStreamer.setupPipelineForAppendOrRecovery(DataStreamer.java:1529) at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:717)
in spark-3.3.4,hudi-0.14.0, it not fixed,set hoodie.metadata.enable=false; Can be used temporarily
I use Spark Sql to insert record to hudi. It work for a short time. However It throw "java.lang.NoSuchMethodError: org.apache.hadoop.hdfs.client.HdfsDataInputStream.getReadStatistics()" after a while.
Steps to reproduce the behavior:
I wrote a scala fuction to make instert sql
Then call it in foreachRDD() of a DSteam
Expected behavior
A clear and concise description of what you expected to happen.
Environment Description
Environment Description
Hudi version : 0.11
Spark version : 3.2.1
Hadoop version : 3.2.2
Storage (HDFS/S3/GCS..) : HDFS
Running on Docker? (yes/no) : no
Here is my config code:
spark-submit:
Stacktrace