apache / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
5.44k stars 2.42k forks source link

[SUPPORT]Failed to Read .log file when i using trino to select hudi table #11480

Open RYiHui opened 5 months ago

RYiHui commented 5 months ago

org.apache.hudi.exception.HoodieException: Exception when constructing record reader at org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:80) at org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.(HoodieRealtimeRecordReader.java:47) at org.apache.hudi.hadoop.realtime.HoodieParquetRealtimeInputFormat.getRecordReader(HoodieParquetRealtimeInputFormat.java:85) at io.trino.plugin.hive.util.HiveUtil.createRecordReader(HiveUtil.java:265) at io.trino.plugin.hive.GenericHiveRecordCursorProvider.lambda$createRecordCursor$1(GenericHiveRecordCursorProvider.java:98) at io.trino.plugin.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:25) at io.trino.plugin.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:98) at io.trino.plugin.hive.GenericHiveRecordCursorProvider.createRecordCursor(GenericHiveRecordCursorProvider.java:97) at io.trino.plugin.hive.HivePageSourceProvider.createHivePageSource(HivePageSourceProvider.java:363) at io.trino.plugin.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:208) at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:49) at io.trino.split.PageSourceManager.createPageSource(PageSourceManager.java:64) at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:308) at io.trino.operator.Driver.processInternal(Driver.java:387) at io.trino.operator.Driver.lambda$processFor$9(Driver.java:291) at io.trino.operator.Driver.tryWithLock(Driver.java:683) at io.trino.operator.Driver.processFor(Driver.java:284) at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1076) at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163) at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:488) at io.trino.$gen.Trino_fb5b60f_dirty____20240620_092419_2.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:834) Caused by: org.apache.hudi.exception.HoodieException: Exception when reading log file at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternalV1(AbstractHoodieLogRecordReader.java:414) at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternal(AbstractHoodieLogRecordReader.java:220) at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.performScan(HoodieMergedLogRecordScanner.java:201) at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:117) at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner.(HoodieMergedLogRecordScanner.java:76) at org.apache.hudi.common.table.log.HoodieMergedLogRecordScanner$Builder.build(HoodieMergedLogRecordScanner.java:466) at org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.getMergedLogRecordScanner(RealtimeCompactedRecordReader.java:101) at org.apache.hudi.hadoop.realtime.RealtimeCompactedRecordReader.(RealtimeCompactedRecordReader.java:69) at org.apache.hudi.hadoop.realtime.HoodieRealtimeRecordReader.constructRecordReader(HoodieRealtimeRecordReader.java:70) ... 23 more Caused by: org.apache.hudi.exception.HoodieIOException: IOException when reading logblock from log file HoodieLogFile{pathStr='hdfs://ludpupgrade2ha/apps/hive/warehouse/prd_updated.db/hudia/.0344a418-e576-497f-960e-9c8b0a7d5085-0_20240618165713231.log.1_0-6294-153405', fileLen=-1} at org.apache.hudi.common.table.log.HoodieLogFileReader.next(HoodieLogFileReader.java:397) at org.apache.hudi.common.table.log.HoodieLogFormatReader.next(HoodieLogFormatReader.java:102) at org.apache.hudi.common.table.log.AbstractHoodieLogRecordReader.scanInternalV1(AbstractHoodieLogRecordReader.java:254) ... 31 more Caused by: java.io.IOException: Could not read metadata fields at org.apache.hudi.common.table.log.block.HoodieLogBlock.getLogMetadata(HoodieLogBlock.java:266) at org.apache.hudi.common.table.log.HoodieLogFileReader.readBlock(HoodieLogFileReader.java:184) at org.apache.hudi.common.table.log.HoodieLogFileReader.next(HoodieLogFileReader.java:395) ... 33 more Caused by: java.io.EOFException: undefined at java.base/java.io.DataInputStream.readFully(DataInputStream.java:202) at org.apache.hudi.common.table.log.block.HoodieLogBlock.getLogMetadata(HoodieLogBlock.java:260) ... 35 more

RYiHui commented 5 months ago

the hudi version i used is 0.14.1 , trino version:361

RYiHui commented 5 months ago

image Is it related to this parameter?

danny0405 commented 5 months ago

This is an known issue, should be fixed for 1.0 I think, cc @yihua which supports this feature a lot.

RYiHui commented 4 months ago

@yihua Could you explain how to resolve the issue or share some links that might be useful to me?

Hfal91 commented 3 months ago

I have exact same issue with AWS Athena.. GENERIC_INTERNAL_ERROR: Exception when constructing record reader Also Hudi v14.1 Can't read log files on RT table

danny0405 commented 3 months ago

cc @yihua and @codope for visibility.

codope commented 3 months ago

Which Hudi version is used for writing to the table? According to the error, a metadata field in the log block could not be read. Trino 361 is pretty old and may not be able to read the log file created by a recent hudi version.