apple / foundationdb

FoundationDB - the open source, distributed, transactional key-value store
https://apple.github.io/foundationdb/
Apache License 2.0
14.38k stars 1.3k forks source link

Rare segmentation fault on master #3318

Closed xumengpanda closed 2 years ago

xumengpanda commented 4 years ago

The master branch currently has a segmentation failure found by nightly. Failure rate 1 out of ~150K.

Command to reproduce: -r simulation -f foundationdb/tests/slow/DDBalanceAndRemove.txt -b on -s 840724214

Below is a snippet of the backtrace. It points to DataDistribution code.


Deque<RelocateShard>::full() const at /home/meng_xu/fdb/foundationdb/flow/Deque.h:170
 (inlined by) RelocateShard& Deque<RelocateShard>::emplace_back<RelocateShard const&>(RelocateShard const&) at /home/meng_xu/fdb/foundationdb/flow/Deque.h:120
 (inlined by) decltype(auto) std::queue<RelocateShard, Deque<RelocateShard> >::emplace<RelocateShard const&>(RelocateShard const&) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_queue.h:263
 (inlined by) void NotifiedQueue<RelocateShard>::send<RelocateShard const&>(RelocateShard const&) at /home/meng_xu/fdb/foundationdb/flow/flow.h:602
 (inlined by) void NotifiedQueue<RelocateShard>::send<RelocateShard const&>(RelocateShard const&) at /home/meng_xu/fdb/foundationdb/flow/flow.h:595
 (inlined by) PromiseStream<RelocateShard>::send(RelocateShard const&) const at /home/meng_xu/fdb/foundationdb/flow/flow.h:913
 (inlined by) a_body1loopBody1 at /home/meng_xu/fdb/foundationdb/fdbserver/DataDistribution.actor.cpp:3162
(anonymous namespace)::TeamTrackerActorState<(anonymous namespace)::TeamTrackerActor>::a_body1loopBody1(int) at /home/meng_xu/fdb/foundationdb/fdbserver/DataDistribution.actor.cpp:3080
Promise<Void>::Promise(Promise<Void> const&) at /home/meng_xu/fdb/foundationdb/flow/flow.h:794
 (inlined by) void __gnu_cxx::new_allocator<Promise<Void> >::construct<Promise<Void>, Promise<Void> const&>(Promise<Void>*, Promise<Void> const&) at /opt/rh/devtoolset-8/root/usr/include/c++/8/ext/new_allocator.h:136
 (inlined by) void std::allocator_traits<std::allocator<Promise<Void> > >::construct<Promise<Void>, Promise<Void> const&>(std::allocator<Promise<Void> >&, Promise<Void>*, Promise<Void> const&) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/alloc_traits.h:475
 (inlined by) void std::vector<Promise<Void>, std::allocator<Promise<Void> > >::_M_realloc_insert<Promise<Void> const&>(__gnu_cxx::__normal_iterator<Promise<Void>*, std::vector<Promise<Void>, std::allocator<Promise<Void> > > >, Promise<Void> const&) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/vector.tcc:436
bool __gnu_cxx::__ops::_Iter_less_val::operator()<__gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, double>(__gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, double&) const at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/predefined_ops.h:65
 (inlined by) void std::__push_heap<__gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, long, double, __gnu_cxx::__ops::_Iter_less_val>(__gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, long, long, double, __gnu_cxx::__ops::_Iter_less_val&) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_heap.h:133
 (inlined by) void std::__adjust_heap<__gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, long, double, __gnu_cxx::__ops::_Iter_less_iter>(__gnu_cxx::__normal_iterator<double*, std::vector<double, std::allocator<double> > >, long, long, double, __gnu_cxx::__ops::_Iter_less_iter) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_heap.h:237
void SAV<DistributorExclusionSafetyCheckReply>::send<DistributorExclusionSafetyCheckReply&>(DistributorExclusionSafetyCheckReply&) at /home/meng_xu/fdb/foundationdb/flow/flow.h:446
SAV<std::vector<StorageServerInterface, std::allocator<StorageServerInterface> > >::~SAV() at /home/meng_xu/fdb/foundationdb/flow/flow.h:430
 (inlined by) SAV<std::vector<StorageServerInterface, std::allocator<StorageServerInterface> > >::destroy() at /home/meng_xu/fdb/foundationdb/flow/flow.h:539
 (inlined by) SAV<std::vector<StorageServerInterface, std::allocator<StorageServerInterface> > >::delFutureRef() at /home/meng_xu/fdb/foundationdb/flow/flow.h:532
 (inlined by) SAV<std::vector<StorageServerInterface, std::allocator<StorageServerInterface> > >::delFutureRef() at /home/meng_xu/fdb/foundationdb/flow/flow.h:527
 (inlined by) Future<std::vector<StorageServerInterface, std::allocator<StorageServerInterface> > >::~Future() at /home/meng_xu/fdb/foundationdb/flow/flow.h:712
 (inlined by) a_body1 at /home/meng_xu/fdb/foundationdb/flow/flow.h:710
 (inlined by) DdExclusionSafetyCheckActor at /home/meng_xu/fdb/build/foundationdb/linux/fdbserver/DataDistribution.actor.g.cpp:26108
 (inlined by) ddExclusionSafetyCheck(DistributorExclusionSafetyCheckRequest const&, Reference<DataDistributorData> const&, Database const&) at /home/meng_xu/fdb/foundationdb/fdbserver/DataDistribution.actor.cpp:4807
(anonymous namespace)::StorageServerFailureTrackerActorState<(anonymous namespace)::StorageServerFailureTrackerActor>::a_body1loopBody1(int) at /home/meng_xu/fdb/foundationdb/fdbserver/DataDistribution.actor.cpp:3498
waitServerListChange(DDTeamCollection* const&, FutureStream<Void> const&) at /home/meng_xu/fdb/foundationdb/fdbserver/DataDistribution.actor.cpp:3296
SerializeSource<ErrorOr<EnsureTable<StatusReply> > >::SerializeSource(ErrorOr<EnsureTable<StatusReply> > const&) at /home/meng_xu/fdb/foundationdb/flow/serialize.h:795
 (inlined by) a_body1cont2 at /home/meng_xu/fdb/foundationdb/fdbrpc/networksender.actor.h:37
SAV<std::vector<std::pair<StorageServerInterface, ProcessClass>, std::allocator<std::pair<StorageServerInterface, ProcessClass> > > >::delFutureRef() at /home/meng_xu/fdb/foundationdb/flow/flow.h:528
 (inlined by) Future<std::vector<std::pair<StorageServerInterface, ProcessClass>, std::allocator<std::pair<StorageServerInterface, ProcessClass> > > >::~Future() at /home/meng_xu/fdb/foundationdb/flow/flow.h:712
 (inlined by) a_body1loopBody1when1 at /home/meng_xu/fdb/foundationdb/fdbserver/DataDistribution.actor.cpp:3307
 (inlined by) a_body1loopBody1 at /home/meng_xu/fdb/build/foundationdb/linux/fdbserver/DataDistribution.actor.g.cpp:11463
Promise<Void>::Promise(Promise<Void> const&) at /home/meng_xu/fdb/foundationdb/flow/flow.h:794
 (inlined by) void __gnu_cxx::new_allocator<Promise<Void> >::construct<Promise<Void>, Promise<Void> const&>(Promise<Void>*, Promise<Void> const&) at /opt/rh/devtoolset-8/root/usr/include/c++/8/ext/new_allocator.h:136
 (inlined by) void std::allocator_traits<std::allocator<Promise<Void> > >::construct<Promise<Void>, Promise<Void> const&>(std::allocator<Promise<Void> >&, Promise<Void>*, Promise<Void> const&) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/alloc_traits.h:475
 (inlined by) void std::vector<Promise<Void>, std::allocator<Promise<Void> > >::_M_realloc_insert<Promise<Void> const&>(__gnu_cxx::__normal_iterator<Promise<Void>*, std::vector<Promise<Void>, std::allocator<Promise<Void> > > >, Promise<Void> const&) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/vector.tcc:436
jzhou77 commented 4 years ago

On release-6.3 branch, there is a another crash in DataDistribution as well:

Seed: -r simulation --crash -f ./foundationdb/tests/fast/AtomicBackupToDBCorrectness.txt -s 946835330 -b on Commit: ac19ba19b8

Program received signal SIGSEGV, Segmentation fault.
std::_Rb_tree<UID, std::pair<UID const, Reference<TCServerInfo> >, std::_Select1st<std::pair<UID const, Reference<TCServerInfo> > >, std::less<UI
D>, std::allocator<std::pair<UID const, Reference<TCServerInfo> > > >::_M_lower_bound (this=<optimized out>, __k=..., __y=0x326edb8,
    __x=0x3865636337623034) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h:1904
1904            if (!_M_impl._M_key_compare(_S_key(__x), __k))  
(gdb) bt
#0  std::_Rb_tree<UID, std::pair<UID const, Reference<TCServerInfo> >, std::_Select1st<std::pair<UID const, Reference<TCServerInfo> > >, std::less<UID>, std::allocator<std::pair<UID const, Reference<TCServerInfo> > > >::_M_lower_bound (this=<optimized out>, __k=..., __y=0x326edb8,
    __x=0x3865636337623034) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h:1904
#1  std::_Rb_tree<UID, std::pair<UID const, Reference<TCServerInfo> >, std::_Select1st<std::pair<UID const, Reference<TCServerInfo> > >, std::less<UID>, std::allocator<std::pair<UID const, Reference<TCServerInfo> > > >::find (this=this@entry=0x326edb0, __k=...)
    at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h:2552
#2  0x0000000000a7d7a9 in std::map<UID, Reference<TCServerInfo>, std::less<UID>, std::allocator<std::pair<UID const, Reference<TCServerInfo> > > >::count (__x=..., this=0x326edb0) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h:988
#3  (anonymous namespace)::TeamTrackerActorState<(anonymous namespace)::TeamTrackerActor>::a_body1loopBody1 (this=0x7ffff1866758, loopDepth=1)
    at /home/jingyu_zhou/fdb/foundationdb/fdbserver/DataDistribution.actor.cpp:3131
jzhou77 commented 2 years ago

Closing this since we have fixed several DD segfaults due to destruction order, and no longer see failures.