gazebosim / gazebo-classic

Gazebo classic. For the latest version, see https://github.com/gazebosim/gz-sim
http://classic.gazebosim.org/
Other
1.17k stars 477 forks source link

Thread-safety of GetEntity from Sensor plugin OnUpdate() #2024

Open osrf-migration opened 8 years ago

osrf-migration commented 8 years ago

Original report (archived issue) by William Woodall (Bitbucket: William Woodall, GitHub: wjwwood).


We have a custom plugin, which is currently a Sensor plugin, that:

See:

This will regularly trigger an assert in the boost::shared_ptr code, like this:

gzserver: /usr/include/boost/smart_ptr/shared_ptr.hpp:653: typename boost::detail::sp_member_access<T>::type boost::shared_ptr<T>::operator->() const [with T = gazebo::physics::Base; typename boost::detail::sp_member_access<T>::type = gazebo::physics::Base*]: Assertion `px != 0' failed.

Program received signal SIGABRT, Aborted.
[Switching to Thread 0x7fff41658700 (LWP 5854)]
0x00007ffff68c3c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56  ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff68c3c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff68c7028 in __GI_abort () at abort.c:89
#2  0x00007ffff68bcbf6 in __assert_fail_base (fmt=0x7ffff6a0d3b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=assertion@entry=0x7ffff63136c0 "px != 0", file=file@entry=0x7ffff6313828 "/usr/include/boost/smart_ptr/shared_ptr.hpp", 
    line=line@entry=653, 
    function=function@entry=0x7ffff6348740 <_ZZNK5boost10shared_ptrIN6gazebo7physics4BaseEEptEvE19__PRETTY_FUNCTION__> "typename boost::detail::sp_member_access<T>::type boost::shared_ptr<T>::operator->() const [with T = gazebo::physics::Base; typename boost::detail::sp_member_access<T>::type = gazebo::physics::Base*]") at assert.c:92
#3  0x00007ffff68bcca2 in __GI___assert_fail (assertion=0x7ffff63136c0 "px != 0", 
    file=0x7ffff6313828 "/usr/include/boost/smart_ptr/shared_ptr.hpp", line=653, 
    function=0x7ffff6348740 <_ZZNK5boost10shared_ptrIN6gazebo7physics4BaseEEptEvE19__PRETTY_FUNCTION__> "typename boost::detail::sp_member_access<T>::type boost::shared_ptr<T>::operator->() const [with T = gazebo::physics::Base; typename boost::detail::sp_member_access<T>::type = gazebo::physics::Base*]") at assert.c:101
#4  0x00007ffff6261ae3 in boost::shared_ptr<gazebo::physics::Base>::operator-> (this=<optimized out>)
    at /usr/include/boost/smart_ptr/shared_ptr.hpp:653
#5  0x00007ffff6262848 in operator-> (this=<optimized out>) at /usr/include/c++/4.8/bits/basic_string.h:539
#6  gazebo::physics::Base::GetByName (this=0x19ec5a0, _name="drill_0::link::collision")
    at /var/lib/jenkins/workspace/gazebo7-debbuilder/build/gazebo-7.3.1/gazebo/physics/Base.cc:291
#7  0x00007ffff62ec426 in gazebo::physics::World::GetByName (this=<optimized out>, _name="drill_0::link::collision")
    at /var/lib/jenkins/workspace/gazebo7-debbuilder/build/gazebo-7.3.1/gazebo/physics/World.cc:997
#8  0x00007ffff62ec550 in gazebo::physics::World::GetEntity (this=<optimized out>, _name="drill_0::link::collision")
    at /var/lib/jenkins/workspace/gazebo7-debbuilder/build/gazebo-7.3.1/gazebo/physics/World.cc:1026
#9  0x00007fff743a7de3 in gazebo::SideContactPlugin::CalculateContactingLinks (this=0xf369f0)
    at /home/william/universal_robot_ws/src/gear/osrf_gear/src/SideContactPlugin.cc:130
#10 0x00007fff745c71ea in gazebo::ConveyorBeltPlugin::OnUpdate (this=0xf369f0)
    at /home/william/universal_robot_ws/src/gear/osrf_gear/src/ConveyorBeltPlugin.cc:92
#11 0x00007ffff5e9a6d5 in operator() (this=<optimized out>) at /usr/include/boost/function/function_template.hpp:767
#12 gazebo::event::EventT<void ()>::Signal() (this=0x357f408)
    at /var/lib/jenkins/workspace/gazebo7-debbuilder/build/gazebo-7.3.1/gazebo/common/Event.hh:370
#13 0x00007ffff5e97f7e in operator() (this=<optimized out>)
    at /var/lib/jenkins/workspace/gazebo7-debbuilder/build/gazebo-7.3.1/gazebo/common/Event.hh:214
#14 gazebo::sensors::Sensor::Update (this=0x35741c0, _force=_force@entry=false)
    at /var/lib/jenkins/workspace/gazebo7-debbuilder/build/gazebo-7.3.1/gazebo/sensors/Sensor.cc:246
#15 0x00007ffff5e9dba5 in gazebo::sensors::SensorManager::SensorContainer::Update (this=0xf15df0, _force=<optimized out>)
    at /var/lib/jenkins/workspace/gazebo7-debbuilder/build/gazebo-7.3.1/gazebo/sensors/SensorManager.cc:585
#16 0x00007ffff5ea0d88 in gazebo::sensors::SensorManager::SensorContainer::RunLoop (this=0xf15df0)
    at /var/lib/jenkins/workspace/gazebo7-debbuilder/build/gazebo-7.3.1/gazebo/sensors/SensorManager.cc:539
#17 0x00007ffff42f7a4a in ?? () from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.54.0
#18 0x00007ffff55ce184 in start_thread (arg=0x7fff41658700) at pthread_create.c:312
#19 0x00007ffff698737d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

I believe this is because the Sensor plugin is running in a separate thread from the physics loop, so when our code calls this->world->GetEntity(collision), it ends up iterating over the children of the world object and in a separate thread we have a plugin that is spawning new models periodically into the world.

My theory is that the GetEntity call ends up here:

Where it is iterating over the world object's children using iterators. If in the other thread objects are being added to this same vector, then these iterators are probably becoming invalid, which eventually results in invalid iterators being interpreted as a boost shared pointer.

I think an appropriate response to this might be "sensor plugins are not thread safe with these calls; won't fix" and we're attempting to work around this now by redesigning the plugin in question to be a model plugin in order to avoid this issue. However, at @scpeters request, I'm opening this issue to keep track of it. Either there needs to be some high level documentation that mentions this thread non-safety and explains how the architecture of the different plugins affects what can be called from where. This documentation may exist, but I looked for a while and didn't find it (please point it out if I missed it).

osrf-migration commented 8 years ago

Original comment by William Woodall (Bitbucket: William Woodall, GitHub: wjwwood).


Also, I was going to select the version in the issue description, but 7.3 is not listed there.

% gazebo --version
Gazebo multi-robot simulator, version 7.3.1
Copyright (C) 2012-2016 Open Source Robotics Foundation.
Released under the Apache 2 License.
http://gazebosim.org

Gazebo multi-robot simulator, version 7.3.1
Copyright (C) 2012-2016 Open Source Robotics Foundation.
Released under the Apache 2 License.
http://gazebosim.org
osrf-migration commented 8 years ago

Original comment by Nate Koenig (Bitbucket: Nathan Koenig).