Open xccui opened 1 year ago
You have enabled the MDT then?
Ah, yes. I forgot MDT was enabled by default in a recent change...
I also noticed this issue with Hudi: 0.11.1 Hadoop: 3.3.4 and 3.3.5 Spark: 3.2.1
It does not happen with Hadoop 3.3.1 or 3.3.3. So it looks like the problem occurs starting in Hadoop 3.3.4
@jfrylings-twilio Did you tried the later versions of hudi i.e. 0.13.1 or 0.12.3. I tried with Hadoop 3.3.4 and Hudi 0.13.1 and 0.12.3 and it worked well. Let us know if you still face issue.
@jfrylings-twilio Did you tried the later versions of hudi i.e. 0.13.1 or 0.12.3. I tried with Hadoop 3.3.4 and Hudi 0.13.1 and 0.12.3 and it worked well. Let us know if you still face issue.
We will try that once Presto supports those later versions of Hudi. Thanks 👍
I used Hudi0.14.1 on Dataproc2.1(Spark3.3.2 Hadoop3.3.6) to upsert Bloom indexed COW table with PartialUpdateAvroPayload, got same error on reading MDT bloomfilters partition hfiles. Missing some jars or not, how to handle?
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.hudi.org.apache.hadoop.hbase.io.hfile.HFile
at org.apache.hudi.io.storage.HoodieHFileUtils.createHFileReader(HoodieHFileUtils.java:59)
at org.apache.hudi.io.storage.HoodieAvroHFileReader.getHFileReader(HoodieAvroHFileReader.java:290)
at org.apache.hudi.io.storage.HoodieAvroHFileReader.getRecordsByKeysIterator(HoodieAvroHFileReader.java:140)
at org.apache.hudi.common.table.log.block.HoodieHFileDataBlock.lookupRecords(HoodieHFileDataBlock.java:205)
at org.apache.hudi.common.table.log.block.HoodieDataBlock.getRecordIterator(HoodieDataBlock.java:154)
at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.getRecordsIterator(AbstractHoodieLogRecordReader.java:956)
at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processDataBlock(AbstractHoodieLogRecordReader.java:780)
at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.processQueuedBlocksForInstant(AbstractHoodieLogRecordReader.java:825)
at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternalV1(AbstractHoodieLogRecordReader.java:403)
at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:220)
at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.scanByFullKeys(HoodieMergedLogRecordScanner.java:160)
at org.apache.hudi.metadata.HoodieMetadataLogRecordReader.getRecordsByKeys(HoodieMetadataLogRecordReader.java:108)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.readLogRecords(HoodieBackedTableMetadata.java:327)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.lookupKeysFromFileSlice(HoodieBackedTableMetadata.java:304)
at org.apache.hudi.metadata.HoodieBackedTableMetadata.lambda$getRecordsByKeys$f9381e22$1(HoodieBackedTableMetadata.java:275)
at org.apache.hudi.common.function.FunctionWrapper.lambda$throwingMapWrapper$0(FunctionWrapper.java:38)
at java.base/java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:195)
at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1655)
at java.base/java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:484)
at java.base/java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:474)
at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:952)
at java.base/java.util.stream.ReduceOps$ReduceTask.doLeaf(ReduceOps.java:926)
at java.base/java.util.stream.AbstractTask.compute(AbstractTask.java:327)
at java.base/java.util.concurrent.CountedCompleter.exec(CountedCompleter.java:746)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
I also noticed this issue with Hudi: 0.14.1 Hadoop: 3.2.2 Spark: 3.4.2 Hbase: 2.4.5 Does anyone have a solution?
We occasionally hit the following exception when running a Flink writer job. The job won't self-heal, but can be recovered by manually restarting the TaskManager.
MDT was enabled.
Environment Description
Hudi version : bdb50ddccc9631317dfb06a06abc38cbd3714ce8
Flink version : 1.16.1
Hadoop version : 3.3.4
Storage (HDFS/S3/GCS..) : S3