Closed xumengpanda closed 2 years ago
On release-6.3 branch, there is a another crash in DataDistribution as well:
Seed: -r simulation --crash -f ./foundationdb/tests/fast/AtomicBackupToDBCorrectness.txt -s 946835330 -b on
Commit: ac19ba19b8
Program received signal SIGSEGV, Segmentation fault.
std::_Rb_tree<UID, std::pair<UID const, Reference<TCServerInfo> >, std::_Select1st<std::pair<UID const, Reference<TCServerInfo> > >, std::less<UI
D>, std::allocator<std::pair<UID const, Reference<TCServerInfo> > > >::_M_lower_bound (this=<optimized out>, __k=..., __y=0x326edb8,
__x=0x3865636337623034) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h:1904
1904 if (!_M_impl._M_key_compare(_S_key(__x), __k))
(gdb) bt
#0 std::_Rb_tree<UID, std::pair<UID const, Reference<TCServerInfo> >, std::_Select1st<std::pair<UID const, Reference<TCServerInfo> > >, std::less<UID>, std::allocator<std::pair<UID const, Reference<TCServerInfo> > > >::_M_lower_bound (this=<optimized out>, __k=..., __y=0x326edb8,
__x=0x3865636337623034) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h:1904
#1 std::_Rb_tree<UID, std::pair<UID const, Reference<TCServerInfo> >, std::_Select1st<std::pair<UID const, Reference<TCServerInfo> > >, std::less<UID>, std::allocator<std::pair<UID const, Reference<TCServerInfo> > > >::find (this=this@entry=0x326edb0, __k=...)
at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h:2552
#2 0x0000000000a7d7a9 in std::map<UID, Reference<TCServerInfo>, std::less<UID>, std::allocator<std::pair<UID const, Reference<TCServerInfo> > > >::count (__x=..., this=0x326edb0) at /opt/rh/devtoolset-8/root/usr/include/c++/8/bits/stl_tree.h:988
#3 (anonymous namespace)::TeamTrackerActorState<(anonymous namespace)::TeamTrackerActor>::a_body1loopBody1 (this=0x7ffff1866758, loopDepth=1)
at /home/jingyu_zhou/fdb/foundationdb/fdbserver/DataDistribution.actor.cpp:3131
Closing this since we have fixed several DD segfaults due to destruction order, and no longer see failures.
The master branch currently has a segmentation failure found by nightly. Failure rate 1 out of ~150K.
Command to reproduce:
-r simulation -f foundationdb/tests/slow/DDBalanceAndRemove.txt -b on -s 840724214
Below is a snippet of the backtrace. It points to
DataDistribution
code.