apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.66k stars 3.27k forks source link

BE crashes when streaming load into a table with bitmap column #2413

Closed gaodayue closed 4 years ago

gaodayue commented 4 years ago

Describe the bug BE crashes with the following stacktrace when loading into a table with bitmap column.

*** SIGSEGV (@0xd6d6244a00) received by PID 16251 (TID 0x7f3f319e1700) from PID 18446744073007286784; stack trace: ***
    @     0x7f3f426575f0 (unknown)
    @          0x33ba750 roaring_bitmap_run_optimize
    @          0x16fd216 Roaring::runOptimize()
    @          0x16fd5e5 doris::RoaringBitmap::size()
    @          0x18899e7 doris::AggregateFuncTraits<>::finalize()
    @          0x1869846 doris::AggregateInfo::finalize()
    @          0x18698b1 doris::Field::agg_finalize()
    @          0x18c29a2 doris::agg_finalize_row<>()
    @          0x18bf49d doris::MemTable::flush()
    @          0x183bbed doris::MemTableFlushExecutor::_flush_memtable()
    @          0x1847b4c boost::_mfi::mf1<>::operator()()
    @          0x1847434 boost::_bi::list2<>::operator()<>()
    @          0x18468a9 boost::_bi::bind_t<>::operator()()
    @          0x1845b40 boost::detail::function::void_function_obj_invoker0<>::invoke()
    @          0x1840fc8 boost::function0<>::operator()()
    @          0x184034a doris::ThreadPool::work_thread()
    @          0x18484de boost::_mfi::mf1<>::operator()()
    @          0x1848436 boost::_bi::list2<>::operator()<>()
    @          0x1848311 boost::_bi::bind_t<>::operator()()
    @          0x18481f8 boost::detail::thread_data<>::run()
    @          0x2547add thread_proxy
    @     0x7f3f4264fe65 start_thread
@     0x7f3f41a5488d __clone

To Reproduce

  1. create a aggregate table with bitmap column
  2. change BE config load_process_max_memory_limit_bytes to some smaller value, e.g., 1073741824
  3. keep submitting several streaming load jobs to that table so that load memory consumption exceeds memory limit from time to time

Note: the bug can't be reproduced on small dataset. I was able to reproduce it after loading billions of rows for several hours.

Root cause analysis The bug was caused by a race described bellow

  1. Thread T1 has closed TabletsChannel x in LoadChannel::add_batch, and is waiting on the LoadChannel lock to remove x from _tablets_channels. At this time, all delta writers from x have been closed.
  2. Thread T2 acquired the lock in LoadChannel::handle_mem_exceed_limit and asks x to reduce_mem_usage(). This will call DeltaWriter::flush_memtable_and_wait on an already closed delta writer, which leads to the last memtable to be flushed twice.

The race can be verified from core dump

Thread T1:
#7  0x00000000017f2803 in std::lock_guard<std::mutex>::lock_guard (this=0x7fda2bb3d220, __m=...) at /usr/local/include/c++/7.2.0/bits/std_mutex.h:162
#8  0x0000000002257302 in doris::LoadChannel::add_batch (this=0x610002104f40, request=..., tablet_vec=0x6070006f8fc8) at /home/gaodayue/doris-git/be/src/runtime/load_channel.cpp:100
#9  0x00000000022494a7 in doris::LoadChannelMgr::add_batch (this=0x60c0000a9900, request=..., tablet_vec=0x6070006f8fc8, wait_lock_time_ns=0x7fda2bb3dc40)

Thread T2:
#4  0x0000000001d6841b in doris::CounterCondVariable::block_wait (this=0x60e000f60dd8) at /home/gaodayue/doris-git/be/src/util/counter_cond_variable.hpp:83
#5  0x0000000001d5a470 in doris::FlushHandler::wait (this=0x60e000f60dc0) at /home/gaodayue/doris-git/be/src/olap/memtable_flush_executor.cpp:41
#6  0x00000000030108d6 in doris::DeltaWriter::flush_memtable_and_wait (this=0x61100134f380) at /home/gaodayue/doris-git/be/src/olap/delta_writer.cpp:193
#7  0x0000000002262012 in doris::TabletsChannel::reduce_mem_usage (this=0x6120003463c0) at /home/gaodayue/doris-git/be/src/runtime/tablets_channel.cpp:177
#8  0x0000000002257a94 in doris::LoadChannel::handle_mem_exceed_limit (this=0x610002104f40, force=true) at /home/gaodayue/doris-git/be/src/runtime/load_channel.cpp:120
#9  0x000000000224a2dc in doris::LoadChannelMgr::_handle_mem_exceed_limit (this=0x60c0000a9900) at /home/gaodayue/doris-git/be/src/runtime/load_channel_mgr.cpp:164
#10 0x0000000002249440 in doris::LoadChannelMgr::add_batch (this=0x60c0000a9900, request=..., tablet_vec=0x607000a5bf98, wait_lock_time_ns=0x7fda2c35cc40)
    at /home/gaodayue/doris-git/be/src/runtime/load_channel_mgr.cpp:120

Why calling DeltaWriter::flush_memtable_and_wait after DeltaWriter::close_wait causes the last memtable to be flushed twice? Because _mem_table is not reset in DeltaWriter::close.

Why flushing memtable twice causes the segment fault in roaring_bitmap_run_optimize? Because after the first flush, slice->data of bitmap value points to serialized bitmap, which can't be cast to RoaringBitmap* in the second flush.

    // The RoaringBitmap object memory will be released by ObjectPool
    static void finalize(RowCursorCell* src, MemPool* mem_pool) {
        auto *slice = reinterpret_cast<Slice*>(src->mutable_cell_ptr());
        auto *bitmap = reinterpret_cast<RoaringBitmap*>(slice->data);

        slice->size = bitmap->size();
        slice->data = (char*)mem_pool->allocate(slice->size);
        bitmap->serialize(slice->data);
    }
imay commented 4 years ago

@morningman

gaodayue commented 4 years ago

PR is one the way