apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
11.84k stars 3.12k forks source link

[Bug] [严重bug]写入数据导致be崩溃且无法恢复 #35483

Open zhangpenggh opened 1 month ago

zhangpenggh commented 1 month ago

Search before asking

Version

2.1.3

What's Wrong?

建表语句如下: create table test_table ( c1 date not null comment '', c2 int not null comment '', c3 varchar(1024) NOT NULL comment '', c4 array<varchar(15)> comment '', c5 array<varchar(1024)> comment '', c6 array<varchar(1024)> comment '', c7 varchar(4096) comment '', c8 varchar(2048) comment '', c9 array<varchar(1024)> comment '', c10 array<varchar(1024)> comment '', INDEX index_domain_dns_c4 (c4) USING INVERTED COMMENT '', INDEX index_domain_dns_c5 (c5) USING INVERTED PROPERTIES("parser" = "english") COMMENT '', ) UNIQUE KEY(c1, c2, c3) auto partition by list(c1,c2)() DISTRIBUTED BY HASH(c3) BUCKETS 5 PROPERTIES ( "replication_num" = "3", "store_row_column" = "true", "enable_unique_key_merge_on_write" = "true", "bloom_filter_columns" = "c3")

创建之后通过stream load 连续多次3并发导入数据一段时间所有be都挂掉。重启be服务依旧无法正常使用。 排查日志,找到如下异常日志: E20240528 10:09:38.822259 24529 variable.cpp:179] Already exposed doris_cache_inverted_index_query_cache_persecond' whose value is0' I20240528 10:09:38.822265 24529 exec_env_init.cpp:497] Inverted index query match cache memory limit: 2.65 GB, origin config value: 10% E20240528 10:09:38.822379 24529 variable.cpp:179] Already exposed doris_cache_last_success_channel_cache' whose value is0' E20240528 10:09:38.822389 24529 variable.cpp:179] Already exposed doris_cache_last_success_channel_cache_persecond' whose value is0' I20240528 10:09:38.822434 24529 wal_manager.cpp:117] wal_dir:/data/doris/storage/wal, tmp_dir:/data/doris/storage/wal/tmp E20240528 10:09:38.822610 24529 variable.cpp:179] Already exposed doris_cache_tablet_schema_cache' whose value is0' E20240528 10:09:38.822633 24529 variable.cpp:179] Already exposed doris_cache_tablet_schema_cache_persecond' whose value is0' I20240528 10:09:38.822685 25541 wal_manager.cpp:481] Scheduled(every 10s) WAL info: [/data/doris/storage/wal: limit 127713958297 Bytes, used 0 Bytes, estimated wal bytes 0 Bytes, available 127713958297 Bytes.]; E20240528 10:09:38.823782 24529 variable.cpp:179] Already exposed doris_cache_mow_tablet_version_cache' whose value is0' E20240528 10:09:38.823801 24529 variable.cpp:179] Already exposed doris_cache_mow_tablet_version_cache_persecond' whose value is0' E20240528 10:09:38.823820 24529 variable.cpp:179] Already exposed doris_cache_create_tablet_rridx_cache' whose value is0' E20240528 10:09:38.823827 24529 variable.cpp:179] Already exposed doris_cache_create_tablet_rridx_cache_persecond' whose value is0'

What You Expected?

至少保证服务稳定性

How to Reproduce?

No response

Anything Else?

No response

Are you willing to submit PR?

Code of Conduct

zhangpenggh commented 1 month ago

应该是倒排索引导致的,同样的表不加倒排索引的话,导入数据就没有问题

dataroaring commented 1 month ago

Could you provide stack in be.out?

zhangpenggh commented 1 month ago

Could you provide stack in be.out?

start time: Tue May 28 11:06:20 CST 2024 INFO: java_cmd /usr/soft/jdk//bin/java INFO: jdk_version 8 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/soft/doris/apache-doris-2.1.3-bin-x64/be/lib/java_extensions/preload-extensions/preload-extensions-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/soft/doris/apache-doris-2.1.3-bin-x64/be/lib/java_extensions/java-udf/java-udf-jar-with-dependencies.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/soft/doris/apache-doris-2.1.3-bin-x64/be/lib/hadoop_hdfs/common/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Reload4jLoggerFactory] [WARNING!] /sys/kernel/mm/transparent_hugepage/enabled: [always] madvise never, Doris not recommend turning on THP, which may cause the BE process to use more memory and cannot be freed in time. Turn off THP: echo madvise | sudo tee /sys/kernel/mm/transparent_hugepage/enabled Query id: 0-0 is nereids: 0 tablet id: 10617 Aborted at 1716865583 (unix time) try "date -d @1716865583" if you are using GNU date Current BE git commitID: 2dc65ce356 SIGSEGV unknown detail explain (@0x0) received by PID 15136 (TID 16282 OR 0x7f1bc5a71700) from PID 0; stack trace: 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t, void) at /home/zcp/repo_center/doris_release/doris/be/src/common/signal_handler.h:421 1# os::Linux::chained_handler(int, siginfo, void) in /usr/soft/jdk/jre/lib/amd64/server/libjvm.so 2# JVM_handle_linux_signal in /usr/soft/jdk/jre/lib/amd64/server/libjvm.so 3# signalHandler(int, siginfo, void) in /usr/soft/jdk/jre/lib/amd64/server/libjvm.so 4# 0x00007F1E0DD75340 in /usr/lib64/libc.so.6 5# lucene::analysis::SimpleTokenizer::next(lucene::analysis::Token) at /home/zcp/repo_center/doris_release/doris/be/src/clucene/src/core/CLucene/analysis/Analyzers.cpp:89 6# lucene::index::SDocumentsWriter::ThreadState::FieldData::invertField(lucene::document::Field, lucene::analysis::Analyzer, int) at /home/zcp/repo_center/doris_release/doris/be/src/clucene/src/core/CLucene/index/SDocumentWriter.cpp:667 7# lucene::index::SDocumentsWriter::ThreadState::FieldData::processField(lucene::analysis::Analyzer) at /home/zcp/repo_center/doris_release/doris/be/src/clucene/src/core/CLucene/index/SDocumentWriter.cpp:344 8# lucene::index::SDocumentsWriter::ThreadState::processDocument(lucene::analysis::Analyzer) at /home/zcp/repo_center/doris_release/doris/be/src/clucene/src/core/CLucene/index/SDocumentWriter.cpp:688 9# lucene::index::SDocumentsWriter::updateDocument(lucene::document::Document, lucene::analysis::Analyzer) at /home/zcp/repo_center/doris_release/doris/be/src/clucene/src/core/CLucene/index/SDocumentWriter.cpp:868 10# lucene::index::IndexWriter::addDocument(lucene::document::Document, lucene::analysis::Analyzer) at /home/zcp/repo_center/doris_release/doris/be/src/clucene/src/core/CLucene/index/IndexWriter.cpp:729 11# doris::segment_v2::InvertedIndexColumnWriterImpl<(doris::FieldType)17>::add_document() at /home/zcp/repo_center/doris_release/doris/be/src/olap/rowset/segment_v2/inverted_index_writer.cpp:275 12# doris::segment_v2::InvertedIndexColumnWriterImpl<(doris::FieldType)17>::add_array_values(unsigned long, void const, unsigned char const, unsigned char const, unsigned long) at /home/zcp/repo_center/doris_release/doris/be/src/olap/rowset/segment_v2/inverted_index_writer.cpp:421 13# doris::segment_v2::ArrayColumnWriter::append_data(unsigned char const*, unsigned long) at /home/zcp/repo_center/doris_release/doris/be/src/olap/rowset/segment_v2/column_writer.cpp:992 14# doris::segment_v2::ArrayColumnWriter::append_nullable(unsigned char const, unsigned char const*, unsigned long) at /home/zcp/repo_center/doris_release/doris/be/src/olap/rowset/segment_v2/column_writer.cpp:1010 15# doris::segment_v2::ColumnWriter::append(unsigned char const, void const, unsigned long) in /usr/soft/doris/apache-doris-2.1.3-bin-x64/be/lib/doris_be 16# doris::segment_v2::SegmentWriter::append_block(doris::vectorized::Block const, unsigned long, unsigned long) in /usr/soft/doris/apache-doris-2.1.3-bin-x64/be/lib/doris_be 17# doris::VerticalBetaRowsetWriter::add_columns(doris::vectorized::Block const, std::vector<unsigned int, std::allocator > const&, bool, unsigned int) at /home/zcp/repo_center/doris_release/doris/be/src/olap/rowset/vertical_beta_rowset_writer.cpp:125 18# doris::Merger::vertical_compact_one_group(std::shared_ptr, doris::ReaderType, std::shared_ptr, bool, std::vector<unsigned int, std::allocator > const&, doris::vectorized::RowSourcesBuffer, std::vector<std::shared_ptr, std::allocator<std::shared_ptr > > const&, doris::RowsetWriter, long, doris::Merger::Statistics, std::vector<unsigned int, std::allocator >) in /usr/soft/doris/apache-doris-2.1.3-bin-x64/be/lib/doris_be 19# doris::Merger::vertical_merge_rowsets(std::shared_ptr, doris::ReaderType, std::shared_ptr, std::vector<std::shared_ptr, std::allocator<std::shared_ptr > > const&, doris::RowsetWriter, long, doris::Merger::Statistics) at /home/zcp/repo_center/doris_release/doris/be/src/olap/merger.cpp:383 20# doris::Compaction::do_compaction_impl(long) at /home/zcp/repo_center/doris_release/doris/be/src/olap/compaction.cpp:371 21# doris::Compaction::do_compaction(long) at /home/zcp/repo_center/doris_release/doris/be/src/olap/compaction.cpp:136 22# doris::CumulativeCompaction::execute_compact_impl() at /home/zcp/repo_center/doris_release/doris/be/src/olap/cumulative_compaction.cpp:79 23# doris::Compaction::execute_compact() at /home/zcp/repo_center/doris_release/doris/be/src/olap/compaction.cpp:118 24# doris::Tablet::execute_compaction(doris::Compaction&) at /home/zcp/repo_center/doris_release/doris/be/src/olap/tablet.cpp:1947 25# std::_Function_handler<void (), doris::StorageEngine::_submit_compaction_task(std::shared_ptr, doris::CompactionType, bool)::$_1>::_M_invoke(std::_Any_data const&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291 26# doris::ThreadPool::dispatch_thread() in /usr/soft/doris/apache-doris-2.1.3-bin-x64/be/lib/doris_be 27# doris::Thread::supervise_thread(void*) at /home/zcp/repo_center/doris_release/doris/be/src/util/thread.cpp:499 28# start_thread in /usr/lib64/libpthread.so.0 29# __clone in /usr/lib64/libc.so.6