apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.74k stars 3.28k forks source link

[Bug] doris query hudi hudi_table_name_rt .NoSuchMethodError #37833

Open xiaofan2022 opened 4 months ago

xiaofan2022 commented 4 months ago

Search before asking

Version

doris 2.0.9 hudi 0.14.0

What's Wrong?

W0712 14:09:56.125739 303537 scanner_scheduler.cpp:396] Scan thread read VScanner failed: [INTERNAL_ERROR]failed to init reader for file hdfs://nameservice1/warehouse/tablespace/external/hive/test001/las_sap_samplereg_1/7287f40d-6782-4d37-a926-3ba72ee661f2-0_0-1-0_20240625103555000.parquet, err: [INTERNAL_ERROR]IOException: java.lang.NoSuchMethodError: org.apache.spark.sql.internal.SQLConf.parquetFilterPushDownStringStartWith()Z CAUSED BY: ExecutionException: java.lang.NoSuchMethodError: org.apache.spark.sql.internal.SQLConf.parquetFilterPushDownStringStartWith()Z CAUSED BY: NoSuchMethodError: org.apache.spark.sql.internal.SQLConf.parquetFilterPushDownStringStartWith()Z

    0#  doris::JniUtil::GetJniExceptionMsg(JNIEnv_*, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) at /home/zcp/repo_center/doris_release/doris/be/src/util/jni-util.h:117
    1#  doris::vectorized::JniConnector::open(doris::RuntimeState*, doris::RuntimeProfile*) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187
    2#  doris::vectorized::HudiJniReader::init_reader(std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::variant<doris::ColumnValueRange<(doris::PrimitiveType)3>, doris::ColumnValueRange<(doris::PrimitiveType)4>, doris::ColumnValueRange<(doris::PrimitiveType)5>, doris::ColumnValueRange<(doris::PrimitiveType)6>, doris::ColumnValueRange<(doris::PrimitiveType)7>, doris::ColumnValueRange<(doris::PrimitiveType)15>, doris::ColumnValueRange<(doris::PrimitiveType)10>, doris::ColumnValueRange<(doris::PrimitiveType)23>, doris::ColumnValueRange<(doris::PrimitiveType)11>, doris::ColumnValueRange<(doris::PrimitiveType)25>, doris::ColumnValueRange<(doris::PrimitiveType)12>, doris::ColumnValueRange<(doris::PrimitiveType)26>, doris::ColumnValueRange<(doris::PrimitiveType)20>, doris::ColumnValueRange<(doris::PrimitiveType)2>, doris::ColumnValueRange<(doris::PrimitiveType)19>, doris::ColumnValueRange<(doris::PrimitiveType)28>, doris::ColumnValueRange<(doris::PrimitiveType)29>, doris::ColumnValueRange<(doris::PrimitiveType)30> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::variant<doris::ColumnValueRange<(doris::PrimitiveType)3>, doris::ColumnValueRange<(doris::PrimitiveType)4>, doris::ColumnValueRange<(doris::PrimitiveType)5>, doris::ColumnValueRange<(doris::PrimitiveType)6>, doris::ColumnValueRange<(doris::PrimitiveType)7>, doris::ColumnValueRange<(doris::PrimitiveType)15>, doris::ColumnValueRange<(doris::PrimitiveType)10>, doris::ColumnValueRange<(doris::PrimitiveType)23>, doris::ColumnValueRange<(doris::PrimitiveType)11>, doris::ColumnValueRange<(doris::PrimitiveType)25>, doris::ColumnValueRange<(doris::PrimitiveType)12>, doris::ColumnValueRange<(doris::PrimitiveType)26>, doris::ColumnValueRange<(doris::PrimitiveType)20>, doris::ColumnValueRange<(doris::PrimitiveType)2>, doris::ColumnValueRange<(doris::PrimitiveType)19>, doris::ColumnValueRange<(doris::PrimitiveType)28>, doris::ColumnValueRange<(doris::PrimitiveType)29>, doris::ColumnValueRange<(doris::PrimitiveType)30> > > > >*) at /home/zcp/repo_center/doris_release/doris/be/src/vec/exec/format/table/hudi_jni_reader.cpp:103
    3#  doris::vectorized::VFileScanner::_get_next_reader() at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:357
    4#  doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:446
    5#  doris::vectorized::VScanner::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_release/doris/be/src/vec/exec/scan/vscanner.cpp:0
    6#  doris::vectorized::ScannerScheduler::_scanner_scan(doris::vectorized::ScannerScheduler*, doris::vectorized::ScannerContext*, std::shared_ptr<doris::vectorized::VScanner>) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:357
    7#  std::_Function_handler<void (), doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext*)::$_1::operator()() const::{lambda()#4}>::_M_invoke(std::_Any_data const&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701
    8#  doris::WorkThreadPool<true>::work_thread(int) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:646
    9#  execute_native_thread_routine at /data/gcc-11.1.0/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:85
    10# start_thread
    11# clone

    0#  doris::vectorized::VFileScanner::_get_next_reader() at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187
    1#  doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:446
    2#  doris::vectorized::VScanner::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /home/zcp/repo_center/doris_release/doris/be/src/vec/exec/scan/vscanner.cpp:0
    3#  doris::vectorized::ScannerScheduler::_scanner_scan(doris::vectorized::ScannerScheduler*, doris::vectorized::ScannerContext*, std::shared_ptr<doris::vectorized::VScanner>) at /home/zcp/repo_center/doris_release/doris/be/src/common/status.h:357
    4#  std::_Function_handler<void (), doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext*)::$_1::operator()() const::{lambda()#4}>::_M_invoke(std::_Any_data const&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701
    5#  doris::WorkThreadPool<true>::work_thread(int) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:646
    6#  execute_native_thread_routine at /data/gcc-11.1.0/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:85
    7#  start_thread
    8#  clone

What You Expected?

All hudi tables can be queried

How to Reproduce?

select count(*) from test001.hudi_rt ;

Anything Else?

No response

Are you willing to submit PR?

Code of Conduct

xiaofan2022 commented 4 months ago

The reason for the problem is: Be through org.apache.doris.com jni_connector call mon. Jni. JniScanner the open method and JniScanner in hudi - scanner modle of a spark-catalyst in version 3.2, However, the public dependency version of Spark-Catalyst provided by the preload-extensions module is 3.5.1. Do you want to change the spark version to 3.4.1?