gazebosim / gazebo-classic

Gazebo classic. For the latest version, see https://github.com/gazebosim/gz-sim
http://classic.gazebosim.org/
Other
1.19k stars 481 forks source link

Gazebo9 - Race condition between EventT::Signal and EventT::Connect leading to segfault during world update #2874

Open ojasjoshi opened 4 years ago

ojasjoshi commented 4 years ago

Stack trace:

#0  std::__atomic_base<bool>::load (__m=std::memory_order_seq_cst, this=0x0) at /usr/include/c++/5/bits/atomic_base.h:396
#1  std::atomic<bool>::operator bool (this=0x0) at /usr/include/c++/5/atomic:81
#2  gazebo::event::EventT<void (gazebo::common::UpdateInfo const&)>::Signal<gazebo::common::UpdateInfo>(gazebo::common::UpdateInfo const&) (
    this=0x7f60bc054940 <gazebo::event::Events::worldUpdateBegin>, _p=...) at /gazebo/gazebo/common/Event.hh:384
#3  0x00007f60ba9aee59 in gazebo::event::EventT<void (gazebo::common::UpdateInfo const&)>::operator()<gazebo::common::UpdateInfo>(gazebo::common::UpdateInfo const&) (_p=..., this=<optimized out>) at /gazebo/gazebo/common/Event.hh:221
#4  gazebo::physics::World::Update (this=this@entry=0x274a180) at /gazebo/gazebo/physics/World.cc:690
#5  0x00007f60ba9be4df in gazebo::physics::World::Step (this=this@entry=0x274a180) at /gazebo/gazebo/physics/World.cc:621
#6  0x00007f60ba9be955 in gazebo::physics::World::RunLoop (this=0x274a180) at /gazebo/gazebo/physics/World.cc:469
#7  0x00007f60b844c5d5 in ?? ()
   from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.58.0
#8  0x00007f60bace36ba in start_thread (arg=0x7f5fb4bf4700) at pthread_create.c:333
#9  0x00007f60bb2ed41d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

A call to EventT::Connect from any subscriber creates a new EventConnection(true, _subscriber) between the event and the subscriber. A call to Event::Signal during worldUpdateBeginLoop then executes callbacks to all the EventConnection's added to this event.

A race condition occurs between EventT::Signal and EventT::Connect when the new EventConnection is not initialized within EventT::Connect but can be referenced by EventT::Signal via this->connections map, leading to a segfault.

Reference from EventT::Connect

this->connections[index].reset(new EventConnection(true, _subscriber));

A potential fix here would be to add a mutex lock guard within EventT::Connect to make it thread-safe. Such a mutex (std::lock_guard<std::mutex> lock(this->mutex)) already exists for EventT::CleanUp but is missing in all other EventT methods. The performance lag occuring from mutex locking EventT::Connect should not be significant considering that a call to Connect occurs at-most once for every subscriber and also considering the design thought behind adding a mutex lock to CleanUp (a call to CleanUp is made during every Signal loop, which should be significantly more than calls to Connect).

chapulina commented 4 years ago

A potential fix here would be to add a mutex lock guard within EventT::Connect to make it thread-safe.

Thank you for the ticket. I think the suggestion of adding a mutex to protect the connections map sounds reasonable.

The performance lag occuring from mutex locking EventT::Connect should not be significant

If we're very concerned about this, we could use the profiler recently added to compare some common scenario before and after adding the mutex.