Open dyang108 opened 1 year ago
Hi @dyang108 Thanks for raising the issue. Could you search for any exception that happened inside the Spark driver logs? The exception logs can happen way before the async compaction failure. We need to understand what exactly causes the Deltastreamer to fail.
Hey Ethan - thanks for the response. I see this exception in the driver logs before the async compaction failure:
22/12/19 20:10:33 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 6.0 (TID 6, 10.0.22.122, executor 3): org.apache.hudi.exception.HoodieException: Exception when reading log file
at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:352)
at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scan(AbstractHoodieLogRecordReader.java:192)
at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:110)
at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.<init>(HoodieMergedLogRecordScanner.java:103)
at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:324)
at org.apache.hudi.table.action.compact.HoodieCompactor.compact(HoodieCompactor.java:198)
at org.apache.hudi.table.action.compact.HoodieCompactor.lambda$compact$57154431$1(HoodieCompactor.java:138)
at org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1.apply(JavaPairRDD.scala:1040)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
at org.apache.spark.storage.memory.MemoryStore.putIterator(MemoryStore.scala:221)
at org.apache.spark.storage.memory.MemoryStore.putIteratorAsBytes(MemoryStore.scala:349)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1182)
at org.apache.spark.storage.BlockManager$$anonfun$doPutIterator$1.apply(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:1091)
at org.apache.spark.storage.BlockManager.doPutIterator(BlockManager.scala:1156)
at org.apache.spark.storage.BlockManager.getOrElseUpdate(BlockManager.scala:882)
at org.apache.spark.rdd.RDD.getOrCompute(RDD.scala:335)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:286)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalStateException: Block has already been inflated
at org.apache.hudi.common.util.ValidationUtils.checkState(ValidationUtils.java:67)
at org.apache.hudi.common.table.log.block.HoodieLogBlock.inflate(HoodieLogBlock.java:267)
at org.apache.hudi.common.table.log.block.HoodieLogBlock.inflate(HoodieLogBlock.java:278)
at org.apache.hudi.common.table.log.block.HoodieDataBlock.readRecordsFromBlockPayload(HoodieDataBlock.java:185)
at org.apache.hudi.common.table.log.block.HoodieDataBlock.getRecordIterator(HoodieDataBlock.java:147)
at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.getRecordsIterator(AbstractHoodieLogRecordReader.java:492)
at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processDataBlock(AbstractHoodieLogRecordReader.java:379)
at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:464)
at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:343)
... 30 more
The issue of Block has already been inflated
is due to a bug that is fixed by #7434.
Hope I can get some help on a problem that I’ve been seeing in Deltastreamer, running on Mesos building on the docker.io/apachehudi/hudi-hadoop_2.8.4-hive_2.3.3-sparkadhoc_2.4.4:latest image on Docker hub. I get the job to run for a few hours successfully but then it consistently fails on delta-sync later on. I’m reading from an Avro Kafka topic Is there another image I should be using to ensure that Deltastreamer functions properly? Any hints on what I might be configuring wrong? I’m new to Hudi and open to all help!
I don't see anything above the async compaction failure stacktrace to indicate anything went wrong before, but also am not sure what to look for.
To Reproduce
Steps to reproduce the behavior:
Built a docker image
Running with Hadoop-aws and trying to write to S3, with the following command in the run.sh
Expected behavior
A clear and concise description of what you expected to happen.
Environment Description
Hudi version : 0.12.1
Spark version : 2.4.4
Hive version : 2.3.3
Hadoop version : 2.8.4
Storage (HDFS/S3/GCS..) : S3
Running on Docker? (yes/no) : yes
Additional context
Add any other context about the problem here.
Stacktrace
Slack here: https://apache-hudi.slack.com/archives/C4D716NPQ/p1671489527580939