apache / doris

Apache Doris is an easy-to-use, high performance and unified analytics database.
https://doris.apache.org
Apache License 2.0
12.3k stars 3.21k forks source link

be产生coredump #3695

Open mdianjun opened 4 years ago

mdianjun commented 4 years ago

coredump时,be.out基本都是以下两种栈信息:

*** Aborted at 1590347829 (unix time) try "date -d @1590347829" if you are using GNU date ***
PC: @           0xf4e487 std::__push_heap<>()
*** SIGSEGV (@0x8) received by PID 233 (TID 0x7f3656ece700) from PID 8; stack trace: ***
    @     0x7f36f0a572f0 (unknown)
    @           0xf4e487 std::__push_heap<>()
    @           0xf47c43 doris::CollectIterator::add_child()
    @           0xf484e2 doris::Reader::_capture_rs_readers()
    @           0xf4b363 doris::Reader::init()
    @          0x15f82c0 doris::EngineChecksumTask::_compute_checksum()
    @           0xea7285 doris::StorageEngine::execute_task()
    @          0x140edbf doris::TaskWorkerPool::_check_consistency_worker_thread_callback()
    @     0x7f36f080ce25 start_thread
    @     0x7f36f0b1fbad __clone
*** Aborted at 1590347830 (unix time) try "date -d @1590347830" if you are using GNU date ***
PC: @           0xf4e487 std::__push_heap<>()
*** SIGSEGV (@0x8) received by PID 239 (TID 0x7f0f5a4c1700) from PID 8; stack trace: ***
    @     0x7f0ffa7af2f0 (unknown)
    @           0xf4e487 std::__push_heap<>()
    @           0xf49c01 doris::Reader::_agg_key_next_row()
    @          0x15f94f7 doris::EngineChecksumTask::_compute_checksum()
    @           0xea7285 doris::StorageEngine::execute_task()
    @          0x140edbf doris::TaskWorkerPool::_check_consistency_worker_thread_callback()
    @     0x7f0ffa564e25 start_thread
    @     0x7f0ffa877bad __clone

打开系统corefile,采集到coredump文件,用gdb查看函数栈:

(gdb) bt
#0  cmp (this=<error reading variable: Cannot access memory at address 0x8>, right=<optimized out>, left=<optimized out>) at /root/jdolap-engine/be/src/olap/types.h:48
#1  compare_cell<doris::RowCursorCell, doris::RowCursorCell> (this=<optimized out>, rhs=..., lhs=...) at /root/jdolap-engine/be/src/olap/field.h:138
#2  compare_row<doris::RowCursor, doris::RowCursor> (rhs=..., lhs=...) at /root/jdolap-engine/be/src/olap/row.h:62
#3  operator() (this=<synthetic pointer>, b=0xfbee8a0, a=<optimized out>) at /root/jdolap-engine/be/src/olap/reader.cpp:269
#4  operator()<__gnu_cxx::__normal_iterator<doris::CollectIterator::ChildCtx**, std::vector<doris::CollectIterator::ChildCtx*> >, __gnu_cxx::__normal_iterator<doris::CollectIterator::ChildCtx**, std::vector<doris::CollectIterator::ChildCtx*> > > (__it2=..., __it1=..., this=<synthetic pointer>) at /usr/include/c++/7.3.0/bits/predefined_ops.h:143
#5  std::__adjust_heap<__gnu_cxx::__normal_iterator<doris::CollectIterator::ChildCtx**, std::vector<doris::CollectIterator::ChildCtx*, std::allocator<doris::CollectIterator::ChildCtx*> > >, long, doris::CollectIterator::ChildCtx*, __gnu_cxx::__ops::_Iter_comp_iter<doris::CollectIterator::ChildCtxComparator> > (__first=..., __holeIndex=__holeIndex@entry=0, __len=26, __value=0x328ea7e0, __comp=...)
    at /usr/include/c++/7.3.0/bits/stl_heap.h:222
#6  0x0000000000f49b0a in __pop_heap<__gnu_cxx::__normal_iterator<doris::CollectIterator::ChildCtx**, std::vector<doris::CollectIterator::ChildCtx*> >, __gnu_cxx::__ops::_Iter_comp_iter<doris::CollectIterator::ChildCtxComparator> > (__comp=<synthetic pointer>, __result=..., __last=..., __first=...) at /usr/include/c++/7.3.0/bits/stl_heap.h:253
#7  pop_heap<__gnu_cxx::__normal_iterator<doris::CollectIterator::ChildCtx**, std::vector<doris::CollectIterator::ChildCtx*> >, doris::CollectIterator::ChildCtxComparator> (__last=..., __first=...,
    __comp=...) at /usr/include/c++/7.3.0/bits/stl_heap.h:320
#8  pop (this=0xae80a48) at /usr/include/c++/7.3.0/bits/stl_queue.h:633
#9  _merge_next (delete_flag=<optimized out>, row=<optimized out>, this=0xae80a20) at /root/jdolap-engine/be/src/olap/reader.cpp:224
#10 next (delete_flag=<optimized out>, row=<optimized out>, this=0xae80a20) at /root/jdolap-engine/be/src/olap/reader.cpp:68
#11 doris::Reader::_agg_key_next_row (this=0x7f203dab94b0, row_cursor=0x7f203dab9390, mem_pool=0x138839d40, agg_pool=<optimized out>, eof=<optimized out>) at /root/jdolap-engine/be/src/olap/reader.cpp:371
#12 0x00000000015f94f7 in next_row_with_aggregation (eof=0x7f203dab9300, agg_pool=<optimized out>, mem_pool=<optimized out>, row_cursor=0x7f203dab9390, this=0x7f203dab94b0)
    at /root/jdolap-engine/be/src/olap/reader.h:130
#13 doris::EngineChecksumTask::_compute_checksum (this=<optimized out>) at /root/jdolap-engine/be/src/olap/task/engine_checksum_task.cpp:120
#14 0x0000000000ea7285 in doris::StorageEngine::execute_task (this=0x625e840, task=task@entry=0x7f203dabc1c0) at /root/jdolap-engine/be/src/olap/storage_engine.cpp:934
#15 0x000000000140edbf in doris::TaskWorkerPool::_check_consistency_worker_thread_callback (arg_this=0xb360b40) at /root/jdolap-engine/be/src/agent/task_worker_pool.cpp:1202
#16 0x00007f253a2f0dc5 in start_thread () from /lib64/libpthread.so.0
#17 0x00007f253a5fbced in clone () from /lib64/libc.so.6

我还不知道怎么重现。

chaoyli commented 4 years ago

It's may be caused by dirty memory address. You can use Address Sanitizer tools incorporated in Doris to found it. http://doris.apache.org/master/zh-CN/developer-guide/debug-tool.html#%E5%86%85%E5%AD%98

Also you can use WeChat to connect me. WeChat : lichaoyong121