GMLC-TDC / HELICS

Hierarchical Engine for Large-scale Infrastructure Co-Simulation (HELICS)
https://docs.helics.org/en/latest/
BSD 3-Clause "New" or "Revised" License
127 stars 40 forks source link

segmentation fault during federate disconnect or destructor #134

Closed jeffdaily closed 6 years ago

jeffdaily commented 6 years ago

On ubuntu 16 VM. Using latest develop head. Built and installed debug static build of helics. Built and installed latest ns-3 examples. Attempted to run two instances of ./ns-3-dev-git/build/contrib/helics/examples/ns3-dev-fed-sndrcv-debug, each with a unique federate name. (Examples are simple and do not take command-line params, so I set the federate name to argv[1] if it exists.) By default the examples provide full level 4 debugging. Their code is here.

First, ran the broker simply ../HELICS-build-Debug-install/bin/helics_broker 2 --loglevel=4.

Ran the two federates. Both federates segfault after seeing 7221-5a2bb694-6247-4ffc-a1d2-27d17aeddd82::|| cmd:disconnect from 65537.

I ran one of the federates with gdb to get the stack trace:

(gdb) where
#0  0x00007ffff4971256 in std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::compare(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x000000000059fb9c in std::operator< <char, std::char_traits<char>, std::allocator<char> > (
    __lhs=<error reading variable: Cannot access memory at address 0x2d636666342d375c>, __rhs="")
    at /usr/include/c++/5/bits/basic_string.h:4989
#2  0x00000000005a0f2f in std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::operator() (
    this=0xa57f00 <AsioServiceManager::services[abi:cxx11]>, 
    __x=<error reading variable: Cannot access memory at address 0x2d636666342d375c>, __y="")
    at /usr/include/c++/5/bits/stl_function.h:387
#3  0x00000000006f07ee in std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<AsioServiceManager> >, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<AsioServiceManager> > >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<AsioServiceManager> > > >::_M_lower_bound (this=0xa57f00 <AsioServiceManager::services[abi:cxx11]>, __x=0x2d636666342d3734, __y=0x7fffec000a90, __k="")
    at /usr/include/c++/5/bits/stl_tree.h:1628
#4  0x00000000006efc8e in std::_Rb_tree<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<AsioServiceManager> >, std::_Select1st<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<AsioServiceManager> > >, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<AsioServiceManager> > > >::find
    (this=0xa57f00 <AsioServiceManager::services[abi:cxx11]>, __k="") at /usr/include/c++/5/bits/stl_tree.h:2295
#5  0x00000000006ef5ed in std::map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<AsioServiceManager>, std::less<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<AsioServiceManager> > > >::find (this=0xa57f00 <AsioServiceManager::services[abi:cxx11]>, __x="") at /usr/include/c++/5/bits/stl_map.h:846
#6  0x00000000006e7f78 in AsioServiceManager::haltServiceLoop (serviceName="")
    at /home/d3n000/HELICS/src/helics/common/AsioServiceManager.cpp:167
#7  0x000000000060a416 in AsioServiceManager::servicer::~servicer (this=0x7fffec000d20, __in_chrg=<optimized out>)
    at /home/d3n000/HELICS/src/helics/core/../common/AsioServiceManager.h:56
#8  0x0000000000611ad6 in std::default_delete<AsioServiceManager::servicer>::operator() (this=0x7ffff32e0560, __ptr=0x7fffec000d20)
    at /usr/include/c++/5/bits/unique_ptr.h:76
#9  0x00000000006136ff in std::unique_ptr<AsioServiceManager::servicer, std::default_delete<AsioServiceManager::servicer> >::reset (
    this=0x7ffff32e0560, __p=0x7fffec000d20) at /usr/include/c++/5/bits/unique_ptr.h:344
#10 0x000000000060f9a5 in std::unique_ptr<AsioServiceManager::servicer, std::default_delete<AsioServiceManager::servicer> >::operator=(decltype(nullptr)) (this=0x7ffff32e0560) at /usr/include/c++/5/bits/unique_ptr.h:280
#11 0x0000000000606225 in helics::BrokerBase::queueProcessingLoop (this=0xabb118)
    at /home/d3n000/HELICS/src/helics/core/BrokerBase.cpp:423
#12 0x000000000062703b in std::_Mem_fn_base<void (helics::BrokerBase::*)(), true>::operator()<, void>(helics::BrokerBase*) const (
    this=0xabbdf0, __object=0xabb118) at /usr/include/c++/5/functional:600
#13 0x0000000000626ab1 in std::_Bind_simple<std::_Mem_fn<void (helics::BrokerBase::*)()> (helics::BrokerBase*)>::_M_invoke<0ul>(std::_Index_tuple<0ul>) (this=0xabbde8) at /usr/include/c++/5/functional:1531
#14 0x0000000000625d6a in std::_Bind_simple<std::_Mem_fn<void (helics::BrokerBase::*)()> (helics::BrokerBase*)>::operator()() (
    this=0xabbde8) at /usr/include/c++/5/functional:1520
#15 0x00000000006242cc in std::thread::_Impl<std::_Bind_simple<std::_Mem_fn<void (helics::BrokerBase::*)()> (helics::BrokerBase*)> >::_M_run() (this=0xabbdd0) at /usr/include/c++/5/thread:115
#16 0x00007ffff4908c80 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#17 0x00007ffff411b6ba in start_thread (arg=0x7ffff32e3700) at pthread_create.c:333
#18 0x00007ffff3e5141d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
phlptp commented 6 years ago

What compiler and branch?

jeffdaily commented 6 years ago

g++ (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609

On branch develop Your branch is up-to-date with 'origin/develop'.

phlptp commented 6 years ago

Definitely something we need to fix, but probably not something easy. I have seen a few things in linux that depend on which thread was terminated first, which is challenging to deal with.
In the meantime try adding cleanupHelicsLibrary (); after the fed-finalize() call see if that removes the seg fault.

I am going to try a few things and might see if can replicate the issue.

which boost version did you use? How many cores on your system?

phlptp commented 6 years ago

Do you just run two of those and the broker?

phlptp commented 6 years ago

@jeffdaily can you check this again with the latest develop branch after the merge of #144

phlptp commented 6 years ago

@jeffdaily have you checked this recently or can we close the issue