Open osrf-migration opened 9 years ago
Original comment by Steve Peters (Bitbucket: Steven Peters, GitHub: scpeters).
There are other race conditions as well. I just found one in the INTEGRATION_world
test (backtrace here).
The WorldTest.RemoveModelPaused
test loads a world in a paused state, takes one physics step, and then calls
World::RemoveModel
to delete two models and verify that they are deleted.
This function locks mutexes, including the physicsUpdateMutex
when the model is being deleted.
There is a race condition, however, as the receiveMutex
is not locked, and the following sequence
can occur which caused the seg-fault recorded here:
World::RunLoop
World::Step
World::ProcessMessages
World::ProcessRequestMsgs
World::BuildSceneMsg
The last call reads from data structures that are being modified, which leads to the race condition.
Original comment by Steve Peters (Bitbucket: Steven Peters, GitHub: scpeters).
I just observed a similar backtrace (details here), so this hasn't yet been fixed.
Original report (archived issue) by Elte Hupkes (Bitbucket: ElteHupkes).
(From comment 21184816 of issue #1629, since this is a separate issue)
There is a race condition which might lead Gazebo to crash with a segmentation fault when deleting a model and this model has a sensor. At the time of the crash, a backtrace of thread 1 shows:
And thread 23:
Both threads perform
Element::Reset()
, one on the parent model, the other on the child link containing the sensor (both of which called by their respective destructors). TheseReset()
's zero the reference counters of the SDF elements, causing their destructors to be called. However, since both are operating partially on the same elements, it might happen that one partialReset()
call is scheduled after a destructor has already deleted the Element'sdataPtr
. This then results in a segfault.Now it seems to me that SDFormat is not designed to be thread-safe, so it would be Gazebo's job to handle this. I'm not sure how though... Use a mutex somewhere that needs to be acquired before calling
Element::Reset()
on anyBase
?