Open scpeters opened 1 year ago
I attached gdb
to the gzclient
processes to get backtraces of the failing case to see where it is stuck. I also introspected gzserver
a bit and it didn't seem to be stuck. It appeared to keep taking physics steps, but there was no output from gz stats
. I don't have anything conclusive on what gzserver
is doing, but it didn't seem stuck like gzclient
is
(gdb) bt
#0 futex_abstimed_wait_cancelable (private=<optimized out>, abstime=0x7ffdc825f460, clockid=<optimized out>, expected=0, futex_word=0x7ffdc825f570)
at ../sysdeps/nptl/futex-internal.h:320
#1 __pthread_cond_wait_common (abstime=0x7ffdc825f460, clockid=<optimized out>, mutex=0x7ffdc825f520, cond=0x7ffdc825f548) at pthread_cond_wait.c:520
#2 __pthread_cond_timedwait (cond=0x7ffdc825f548, mutex=0x7ffdc825f520, abstime=0x7ffdc825f460) at pthread_cond_wait.c:665
#3 0x00007fe6eb242f48 in () at /lib/x86_64-linux-gnu/libOgreTerrain.so.1.9.0
#4 0x00007fe6eb231668 in Ogre::Terrain::waitForDerivedProcesses() () at /lib/x86_64-linux-gnu/libOgreTerrain.so.1.9.0
#5 0x00007fe6eb235a5e in Ogre::Terrain::~Terrain() () at /lib/x86_64-linux-gnu/libOgreTerrain.so.1.9.0
#6 0x00007fe6eb23615d in Ogre::Terrain::~Terrain() () at /lib/x86_64-linux-gnu/libOgreTerrain.so.1.9.0
#7 0x00007fe6eb247857 in Ogre::TerrainGroup::TerrainSlot::freeInstance() () at /lib/x86_64-linux-gnu/libOgreTerrain.so.1.9.0
#8 0x00007fe6eb2484bb in Ogre::TerrainGroup::handleResponse(Ogre::WorkQueue::Response const*, Ogre::WorkQueue const*) ()
at /lib/x86_64-linux-gnu/libOgreTerrain.so.1.9.0
#9 0x00007fe6f1f53b70 in Ogre::DefaultWorkQueueBase::processResponse(Ogre::WorkQueue::Response*) () at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#10 0x00007fe6f1f5739b in Ogre::DefaultWorkQueueBase::processRequestResponse(Ogre::WorkQueue::Request*, bool) ()
at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#11 0x00007fe6f1f57942 in Ogre::DefaultWorkQueueBase::addRequest(unsigned short, unsigned short, Ogre::Any const&, unsigned char, bool, bool) ()
at /lib/x86_64-linux-gnu/libOgreMain.so.1.9.0
#12 0x00007fe6eb2461c1 in Ogre::TerrainGroup::loadTerrainImpl(Ogre::TerrainGroup::TerrainSlot*, bool) () at /lib/x86_64-linux-gnu/libOgreTerrain.so.1.9.0
#13 0x00007fe6eb24624f in Ogre::TerrainGroup::loadAllTerrains(bool) () at /lib/x86_64-linux-gnu/libOgreTerrain.so.1.9.0
#14 0x00007fe6f155133d in gazebo::rendering::Heightmap::Load() () at /lib/x86_64-linux-gnu/libgazebo_rendering.so.11
#15 0x00007fe6f1552364 in gazebo::rendering::Heightmap::LoadFromMsg(boost::shared_ptr<gazebo::msgs::Visual const> const&) ()
at /lib/x86_64-linux-gnu/libgazebo_rendering.so.11
#16 0x00007fe6f15bc0a0 in gazebo::rendering::Scene::ProcessVisualMsg(boost::shared_ptr<gazebo::msgs::Visual const> const&, gazebo::rendering::Visual::VisualType) () at /lib/x86_64-linux-gnu/libgazebo_rendering.so.11
#17 0x00007fe6f15bd0e8 in gazebo::rendering::Scene::PreRender() () at /lib/x86_64-linux-gnu/libgazebo_rendering.so.11
#18 0x00007fe6f293dc02 in () at /lib/x86_64-linux-gnu/libgazebo_gui.so.11
#19 0x00007fe6f293273c in gazebo::gui::GLWidget::paintEvent(QPaintEvent*) () at /lib/x86_64-linux-gnu/libgazebo_gui.so.11
#20 0x00007fe6f0f1e2b6 in QWidget::event(QEvent*) () at /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#21 0x00007fe6f0edba66 in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#22 0x00007fe6f0ee50f0 in QApplication::notify(QObject*, QEvent*) () at /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#23 0x00007fe6f233c80a in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#24 0x00007fe6f0f16f4a in QWidgetPrivate::sendPaintEvent(QRegion const&) () at /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#25 0x00007fe6f0f17799 in QWidgetPrivate::drawWidget(QPaintDevice*, QRegion const&, QPoint const&, int, QPainter*, QWidgetBackingStore*) ()
at /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#26 0x00007fe6f0eec09d in QWidgetPrivate::repaint_sys(QRegion const&) () at /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#27 0x00007fe6f0f06318 in QWidgetPrivate::syncBackingStore() () at /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#28 0x00007fe6f0f1ecac in QWidget::event(QEvent*) () at /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#29 0x00007fe6f0edba66 in QApplicationPrivate::notify_helper(QObject*, QEvent*) () at /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#30 0x00007fe6f0ee50f0 in QApplication::notify(QObject*, QEvent*) () at /lib/x86_64-linux-gnu/libQt5Widgets.so.5
#31 0x00007fe6f233c80a in QCoreApplication::notifyInternal2(QObject*, QEvent*) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#32 0x00007fe6f233f488 in QCoreApplicationPrivate::sendPostedEvents(QObject*, int, QThreadData*) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#33 0x00007fe6f2394e37 in () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#34 0x00007fe6ed34417d in g_main_context_dispatch () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#35 0x00007fe6ed344400 in () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
#36 0x00007fe6ed3444a3 in g_main_context_iteration () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
--Type <RET> for more, q to quit, c to continue without paging--c
#37 0x00007fe6f2394435 in QEventDispatcherGlib::processEvents(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#38 0x00007fe6f233b3ab in QEventLoop::exec(QFlags<QEventLoop::ProcessEventsFlag>) () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#39 0x00007fe6f2343116 in QCoreApplication::exec() () at /lib/x86_64-linux-gnu/libQt5Core.so.5
#40 0x00007fe6f2946a4b in gazebo::gui::run(int, char**) () at /lib/x86_64-linux-gnu/libgazebo_gui.so.11
#41 0x0000564164182c89 in ()
#42 0x00007fe6f1755083 in __libc_start_main (main=0x564164182c40, argc=4, argv=0x7ffdc82617b8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffdc82617a8) at ../csu/libc-start.c:308
#43 0x0000564164182d3e in ()
(gdb)
looking a bit more closely at the console output from the failing case:
[Msg] Loading heightmap: heightmap_bowl
[Msg] Heightmap loaded. Process took: 0.214018 seconds
[Msg] Saving heightmap cache data to /home/scpeters/.gazebo/paging/heightmap_bowl
[Msg] Loading heightmap: heightmap_bowl
[Msg] Loading heightmap cache data: /home/scpeters/.gazebo/paging/heightmap_bowl/gazebo_terrain_LOD0_00000000.dat
[Msg] Heightmap cache data saved. Process took: 0.285364 seconds.
I believe these console messages are coming from different processes (gzserver
and gzclient
), so I'll annotate them with 1
and 2
depending on which process I think they are from:
1 [Msg] Loading heightmap: heightmap_bowl
1 [Msg] Heightmap loaded. Process took: 0.214018 seconds
1 [Msg] Saving heightmap cache data to /home/scpeters/.gazebo/paging/heightmap_bowl
2 [Msg] Loading heightmap: heightmap_bowl
2 [Msg] Loading heightmap cache data: /home/scpeters/.gazebo/paging/heightmap_bowl/gazebo_terrain_LOD0_00000000.dat
1 [Msg] Heightmap cache data saved. Process took: 0.285364 seconds.
I believe the problem is that process that loads the heightmap first starts to write it but doesn't complete the write before the second process starts reading from that folder. It doesn't look like they are both trying to write to the folder, but that one process needs to wait before reading
we could use flock on Linux to lock the files, but I'm not sure if there's a cross-platform API implemented somewhere else
we could use flock on Linux to lock the files, but I'm not sure if there's a cross-platform API implemented somewhere else
we can also use a boost API, since gazebo-classic already depends on boost
#include <boost/interprocess/sync/file_lock.hpp>
we can also use a boost API, since gazebo-classic already depends on boost
- https://www.boost.org/doc/libs/1_55_0/doc/html/interprocess/synchronization_mechanisms.html
#include <boost/interprocess/sync/file_lock.hpp>
but we don't depend on the interprocess
component, so that would be a significant change. Still easiest to make a Linux-only fix with flock
Environment
Description
gazebo
when the~/.gazebo/paging
cache folder is empty with a world containing heightmaps and wide-angle camera sensors.gazebo
executable orgzserver
andgzclient
in separate processes.gzserver
andgzclient
in separate processes, but fails when using thegazebo
executable. In the failing case, it appears to deadlock, with nothing loading in thegzclient
window and no output fromgz stats
. Closing gazebo and trying again, generally works, as the~/.gazebo/paging
folder is not empty.Steps to reproduce
rm -rf ~/.gazebo/paging
wget https://github.com/gazebosim/gazebo-classic/raw/gazebo11/worlds/lensflare_wideangle_cam.world
gazebo --verbose lensflare_wideangle_cam.world
Output
Failed run
Subsequent successful run
The
Node::Advertise()
errors occur in both cases, so I think they can be ignored.