apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.73k stars 3.28k forks source link

[Bug] throws "InternalError: null" when query hudi MOR table after compaction #42863

Closed cgerx closed 2 weeks ago

cgerx commented 2 weeks ago

Search before asking

Version

Doris: 3.0.2 Hudi: 0.15

What's Wrong?

I am using Doris to read a Hudi MOR table. When the table has not been compacted, there are no issues. However, after compaction and the generation of Parquet files, I execute:

select * from table1;

ERROR 1105 (HY000): errCode = 2, detailMessage = (10.211.55.22)[INTERNAL_ERROR]cur path: hdfs://hadoop1:8020/user/hudi/warehouse/hudi_ods/table1/1/19fb8645-e282-4f3a-90a1-84c8b2be63a2_0-1-0_20241029060428876.parquet. InternalError: null

the error stack is :

W20241029 17:50:44.216233 28679 task_scheduler.cpp:165] Pipeline task failed. query_id: 27fd0b61a25048c1-964280001c1d0e50 reason: [INTERNAL_ERROR]cur p
ath: hdfs://hadoop1:8020/data/hive/warehouse/hudi_ods.db/table1/1/cbe397a3-8b7e-4c13-8024-9a9a816e884b_0-1-0_20241029044250098.parquet. InternalError: 
null

        0#  doris::JniUtil::GetJniExceptionMsg(JNIEnv_*, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
        1#  doris::vectorized::JniConnector::get_next_block(doris::vectorized::Block*, unsigned long*, bool*)
        2#  doris::vectorized::VFileScanner::_get_block_wrapped(doris::RuntimeState*, doris::vectorized::Block*, bool*)
        3#  doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState*, doris::vectorized::Block*, bool*)
        4#  doris::vectorized::VScanner::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*)
        5#  doris::vectorized::ScannerScheduler::_scanner_scan(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared_ptr<doris::vectorized::S
canTask>)
        6#  std::_Function_handler<void (), doris::vectorized::ScannerScheduler::submit(std::shared_ptr<doris::vectorized::ScannerContext>, std::shared
_ptr<doris::vectorized::ScanTask>)::$_1::operator()() const::{lambda()#1}>::_M_invoke(std::_Any_data const&)
        7#  doris::ThreadPool::dispatch_thread()
        8#  doris::Thread::supervise_thread(void*)
        9#  start_thread
        10# thread_start

What You Expected?

Whats wrong? Doris not support hudi 0.15? Thanks a lot

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Code of Conduct

cgerx commented 2 weeks ago

Same error on hudi 0.14

My table DDL is

CREATE TABLE table1(
  uuid VARCHAR(20) PRIMARY KEY NOT ENFORCED,
  `partition` VARCHAR(20)
)
PARTITIONED BY (`partition`)
with(
  'connector'='hudi',
  'path' = 'hdfs://hadoop1:8020/data/hive/warehouse/hudi_ods.db/table1',
  'table.type'='MERGE_ON_READ',       
  'hive_sync.enable'='true',           
  'hive_sync.table'='table1',        
  'hive_sync.db'='hive_ods',            
  'hive_sync.mode' = 'hms',         
  'hive_sync.metastore.uris' = 'thrift://hadoop1:9083' 
);

My catalog:

MySQL [(none)]> show catalog hudi;
+--------------------------------------------+-------------------------------+
| Key                                        | Value                         |
+--------------------------------------------+-------------------------------+
| use_meta_cache                             | true                          |
| type                                       | hms                           |
| ipc.client.fallback-to-simple-auth-allowed | true                          |
| hive.metastore.uris                        | thrift://hadoop1:9083         |
| hadoop.username                            | root                          |
| create_time                                | 2024-10-29 13:45:30.494582511 |
+--------------------------------------------+-------------------------------+
cgerx commented 2 weeks ago

The problem has resolved. I found another issue in be.INFO, which led to the problem I mentioned earlier.

I20241029 17:50:44.121132 27945 pipeline_fragment_context.cpp:259] PipelineFragmentContext::prepare|query_id=27fd0b61a25048c1-964280001c1d0e50|fragment
_id=0|pthread_id=540549818400
W20241029 17:50:44.215984 28732 jni-util.cpp:312] java.lang.InternalError
        at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.init(Native Method)
        at org.apache.hadoop.io.compress.zlib.ZlibDecompressor.<init>(ZlibDecompressor.java:114)
        at org.apache.hadoop.io.compress.GzipCodec$GzipZlibDecompressor.<init>(GzipCodec.java:229)
        at org.apache.hadoop.io.compress.GzipCodec.createDecompressor(GzipCodec.java:188)
        at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:183)
        at org.apache.spark.sql.execution.datasources.parquet.ParquetCodecFactory$HeapBytesDecompressor.<init>(ParquetCodecFactory.java:58)
        at org.apache.spark.sql.execution.datasources.parquet.ParquetCodecFactory.createDecompressor(ParquetCodecFactory.java:110)
        at org.apache.parquet.hadoop.CodecFactory.getDecompressor(CodecFactory.java:212)
        at org.apache.parquet.hadoop.CodecFactory.getDecompressor(CodecFactory.java:43)
        at org.apache.parquet.hadoop.ParquetFileReader$Chunk.readAllPages(ParquetFileReader.java:1664)
        at org.apache.parquet.hadoop.ParquetFileReader$Chunk.readAllPages(ParquetFileReader.java:1547)
        at org.apache.parquet.hadoop.ParquetFileReader.readChunkPages(ParquetFileReader.java:1157)
        at org.apache.parquet.hadoop.ParquetFileReader.internalReadRowGroup(ParquetFileReader.java:993)
        at org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:940)
        at org.apache.parquet.hadoop.ParquetFileReader.readNextFilteredRowGroup(ParquetFileReader.java:1082)
        at org.apache.spark.sql.execution.datasources.parquet.SpecificParquetRecordReaderBase$ParquetRowGroupReaderImpl.readNextRowGroup(SpecificParquetRecordReaderBase.java:274)

According to: https://doris.apache.org/docs/3.0/faq/lakehouse-faq/

image