apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
11.92k stars 3.14k forks source link

[Bug] doris read paimon(data on obs) do not support.paimon catalog on filesystem #29029

Open zck573693104 opened 6 months ago

zck573693104 commented 6 months ago

Search before asking

Version

doris-2.0.3 paimon-0.5

What's Wrong?

read error.do not support obs. paimon catalog on filesystem.

What You Expected?

2023-12-26 10:28:03,915 WARN (thrift-server-pool-86|3331) [Coordinator.updateFragmentExecStatus():2223] one instance report fail, query_id=9ab3891365f0459b-976f9e4418e307da instance_id=9ab3891365f0459b-976f9e4418e30814, error message: (172.27.0.150)[CANCELLED][INTERNAL_ERROR]failed to init reader for file dummyPath, err: [INTERNAL_ERROR]UncheckedExecutionException: java.io.UncheckedIOException: org.apache.paimon.fs.UnsupportedSchemeException: Could not find a file io implementation for scheme 'obs' in the classpath. Hadoop FileSystem also cannot access this path 'obs://obs-path/tmp/test'. CAUSED BY: UncheckedIOException: org.apache.paimon.fs.UnsupportedSchemeException: Could not find a file io implementation for scheme 'obs' in the classpath. Hadoop FileSystem also cannot access this path 'obs://obs-path/tmp/test'. CAUSED BY: UnsupportedSchemeException: Could not find a file io implementation for scheme 'obs' in the classpath. Hadoop FileSystem also cannot access this path 'obs://obs-path/tmp/test'.

0#  doris::JniUtil::GetJniExceptionMsg(JNIEnv_*, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) at /root/src/doris-2.0/be/src/util/jni-util.h:110
1#  doris::vectorized::JniConnector::open(doris::RuntimeState*, doris::RuntimeProfile*) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187
2#  doris::vectorized::PaimonJniReader::init_reader(std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::variant<doris::ColumnValueRange<(doris::PrimitiveType)3>, doris::ColumnValueRange<(doris::PrimitiveType)4>, doris::ColumnValueRange<(doris::PrimitiveType)5>, doris::ColumnValueRange<(doris::PrimitiveType)6>, doris::ColumnValueRange<(doris::PrimitiveType)7>, doris::ColumnValueRange<(doris::PrimitiveType)15>, doris::ColumnValueRange<(doris::PrimitiveType)10>, doris::ColumnValueRange<(doris::PrimitiveType)23>, doris::ColumnValueRange<(doris::PrimitiveType)11>, doris::ColumnValueRange<(doris::PrimitiveType)25>, doris::ColumnValueRange<(doris::PrimitiveType)12>, doris::ColumnValueRange<(doris::PrimitiveType)26>, doris::ColumnValueRange<(doris::PrimitiveType)20>, doris::ColumnValueRange<(doris::PrimitiveType)2>, doris::ColumnValueRange<(doris::PrimitiveType)19>, doris::ColumnValueRange<(doris::PrimitiveType)28>, doris::ColumnValueRange<(doris::PrimitiveType)29>, doris::ColumnValueRange<(doris::PrimitiveType)30> >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::variant<doris::ColumnValueRange<(doris::PrimitiveType)3>, doris::ColumnValueRange<(doris::PrimitiveType)4>, doris::ColumnValueRange<(doris::PrimitiveType)5>, doris::ColumnValueRange<(doris::PrimitiveType)6>, doris::ColumnValueRange<(doris::PrimitiveType)7>, doris::ColumnValueRange<(doris::PrimitiveType)15>, doris::ColumnValueRange<(doris::PrimitiveType)10>, doris::ColumnValueRange<(doris::PrimitiveType)23>, doris::ColumnValueRange<(doris::PrimitiveType)11>, doris::ColumnValueRange<(doris::PrimitiveType)25>, doris::ColumnValueRange<(doris::PrimitiveType)12>, doris::ColumnValueRange<(doris::PrimitiveType)26>, doris::ColumnValueRange<(doris::PrimitiveType)20>, doris::ColumnValueRange<(doris::PrimitiveType)2>, doris::ColumnValueRange<(doris::PrimitiveType)19>, doris::ColumnValueRange<(doris::PrimitiveType)28>, doris::ColumnValueRange<(doris::PrimitiveType)29>, doris::ColumnValueRange<(doris::PrimitiveType)30> > > > >*) at /root/src/doris-2.0/be/src/vec/exec/format/table/paimon_reader.cpp:90
3#  doris::vectorized::VFileScanner::_get_next_reader() at /root/src/doris-2.0/be/src/common/status.h:354
4#  doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /root/src/doris-2.0/be/src/common/status.h:442
5#  doris::vectorized::VScanner::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /root/src/doris-2.0/be/src/vec/exec/scan/vscanner.cpp:0
6#  doris::vectorized::ScannerScheduler::_scanner_scan(doris::vectorized::ScannerScheduler*, doris::vectorized::ScannerContext*, std::shared_ptr<doris::vectorized::VScanner>) at /root/src/doris-2.0/be/src/common/status.h:354
7#  std::_Function_handler<void (), doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext*)::$_1::operator()() const::{lambda()#4}>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701
8#  doris::WorkThreadPool<true>::work_thread(int) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:646
9#  execute_native_thread_routine at /data/gcc-11.1.0/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:85
10# start_thread
11# __clone

0#  doris::vectorized::VFileScanner::_get_next_reader() at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/basic_string.h:187
1#  doris::vectorized::VFileScanner::_get_block_impl(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /root/src/doris-2.0/be/src/common/status.h:442
2#  doris::vectorized::VScanner::get_block(doris::RuntimeState*, doris::vectorized::Block*, bool*) at /root/src/doris-2.0/be/src/vec/exec/scan/vscanner.cpp:0
3#  doris::vectorized::ScannerScheduler::_scanner_scan(doris::vectorized::ScannerScheduler*, doris::vectorized::ScannerContext*, std::shared_ptr<doris::vectorized::VScanner>) at /root/src/doris-2.0/be/src/common/status.h:354
4#  std::_Function_handler<void (), doris::vectorized::ScannerScheduler::_schedule_scanners(doris::vectorized::ScannerContext*)::$_1::operator()() const::{lambda()#4}>::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/shared_ptr_base.h:701
5#  doris::WorkThreadPool<true>::work_thread(int) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/atomic_base.h:646
6#  execute_native_thread_routine at /data/gcc-11.1.0/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/unique_ptr.h:85
7#  start_thread
8#  __clone

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Code of Conduct

DongLiang-0 commented 6 months ago

Currently paimon only supports s3 and oss ​​protocols. Can the obs protocol be changed to s3 protocol?

zck573693104 commented 6 months ago

Currently paimon only supports s3 and oss ​​protocols. Can the obs protocol be changed to s3 protocol?

obs is huawei cloud, obs is s3 protocol

zck573693104 commented 6 months ago

Currently paimon only supports s3 and oss ​​protocols. Can the obs protocol be changed to s3 protocol?

obs is huawei cloud, obs is s3 protocol

doris read hive(data on obs) is ok.but,doris read paimon(data on obs) do not support

DongLiang-0 commented 6 months ago

What I mean is that currently paimon itself does not support obs. You can explicitly specify obs as s3. Please refer to: https://paimon.apache.org/docs/master/filesystems/overview https://github.com/apache/incubator-paimon/issues/2335

zck573693104 commented 6 months ago

What I mean is that currently paimon itself does not support obs. You can explicitly specify obs as s3. Please refer to: https://paimon.apache.org/docs/master/filesystems/overview apache/incubator-paimon#2335

paimon can write data into obs. When I created catalog in doris, it was displayed that the specified warehouse was S3 protocol. show tables from 'paimon_oss'. 'default'; I couldn't get a result, so it worked when I changed S3 to obs