Closed doisyg closed 2 years ago
I'm not seeing anything here that makes me suspect of pluginlib http://wiki.ros.org/noetic/Migration. The melodic devel branch looks identical to the noetic devel branch (https://github.com/ros/pluginlib).
Does the rospack get plugins call detect it properly?
I'd be more than happy to release it once we work through these issues.
Yes:
$ rospack plugins --attrib=plugin costmap_2d
spatio_temporal_voxel_layer /ws/src/spatio_temporal_voxel_layer/costmap_plugins.xml
costmap_2d /opt/ros/noetic/share/costmap_2d/costmap_plugins.xml
What's strange is that when preloading jemalloc with LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2
, the error disappears, but then it causes move_base to crash.
I will keep you posted if I have time to investigate.
Crash log (with LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2
before roslaunch):
[ WARN] [/move_base]: global_costmap: Pre-Hydro parameter "static_map" unused since "plugins" is provided
[ WARN] [/move_base]: global_costmap: Pre-Hydro parameter "map_type" unused since "plugins" is provided
[ INFO] [/move_base]: global_costmap: Using plugin "static_layer"
[ INFO] [/move_base]: Requesting the map...
[ INFO] [/move_base]: Resizing costmap to 200 X 200 at 0.050000 m/pix
[ INFO] [/move_base]: Received a 200 X 200 map at 0.050000 m/pix
[ INFO] [/move_base]: global_costmap: Using plugin "obstacle_layer"
[ INFO] [/move_base]: Subscribed to Topics: laser_scan_sensor
[ INFO] [/move_base]: global_costmap: Using plugin "inflation_layer"
[ WARN] [/move_base]: local_costmap: Pre-Hydro parameter "static_map" unused since "plugins" is provided
[ WARN] [/move_base]: local_costmap: Pre-Hydro parameter "map_type" unused since "plugins" is provided
[ INFO] [/move_base]: local_costmap: Using plugin "static_layer"
[ INFO] [/move_base]: Requesting the map...
[ INFO] [/move_base]: Resizing static layer to 200 X 200 at 0.050000 m/pix
[ INFO] [/move_base]: Received a 200 X 200 map at 0.050000 m/pix
[ INFO] [/move_base]: local_costmap: Using plugin "obstacle_layer"
[ INFO] [/move_base]: Subscribed to Topics: laser_scan_sensor
[ INFO] [/move_base]: local_costmap: Using plugin "rgbd_obstacle_layer"
[ INFO] [/move_base]: local_costmap/rgbd_obstacle_layer being initialized as SpatioTemporalVoxelLayer!
[ INFO] [/move_base]: local_costmap/rgbd_obstacle_layer's global frame is map.
[ INFO] [/move_base]: local_costmap/rgbd_obstacle_layer loaded parameters from parameter server.
[ INFO] [/move_base]: local_costmap/rgbd_obstacle_layer created underlying voxel grid.
[ INFO] [/move_base]: local_costmap/rgbd_obstacle_layer initialization complete!
[move_base-3] process has died [pid 402449, exit code -11, cmd /opt/ros/noetic/lib/move_base/move_base __name:=move_base __log:=/home/gd/.ros/log/50b524f2-a0bf-11ea-9306-9d8c2f1ba3e5/move_base-3.log].
log file: /home/gd/.ros/log/50b524f2-a0bf-11ea-9306-9d8c2f1ba3e5/move_base-3*.log
Running gdb to know where the crash happens would be useful. Might even be on movebase side if it crashes after the “complete” message
I assume you speak about the second issue, it looks like it is from the updateFootprint
function:
[ INFO] [/move_base]: local_costmap/rgbd_obstacle_layer created underlying voxel grid.
[ INFO] [/move_base]: local_costmap/rgbd_obstacle_layer initialization complete!
[New Thread 0x7fffe65ce700 (LWP 69227)]
--Type <RET> for more, q to quit, c to continue without paging--
Thread 11 "move_base" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe65ce700 (LWP 69227)]
0x00007ffff1eaba97 in spatio_temporal_voxel_layer::SpatioTemporalVoxelLayer::updateFootprint(double, double, double, double*, double*, double*, double*) () from /ws/devel/lib//libspatio_temporal_voxel_layer.so
However I don't know if this second issue is linked or not to the first one and its workaround (preloading jemalloc)
If started with stvl param enabled: false
or update_footprint_enabled : false
(and still preloading jemalloc), it doesn't crash.
Then, at runtime (changing dyn param):
If enabled: true
and update_footprint_enabled : false
, no crash
Then if enabled: true
and update_footprint_enabled : true
=> crash
What was the crash from it you had GDB up? What was the traceback? Maybe your footprint parameter wasn't read in correctly so one of the double pointers was null?
It would be good to know where the error is in this function https://github.com/SteveMacenski/spatio_temporal_voxel_layer/blob/melodic-devel/src/spatio_temporal_voxel_layer.cpp#L477-L496
(since some of them are costmap calls, then maybe actually an error in costmap 2d. In fact, all those functions inside of this are provided by costmap 2d)
I am not fully understanding why, but because the updateFootprint function has no return value, the for loop was running way above _transformed_footprint.size()
and overflowing
https://github.com/SteveMacenski/spatio_temporal_voxel_layer/blob/67a6e4d93e71168a2da068c374c96bdc83c29e4a/src/spatio_temporal_voxel_layer.cpp#L477-L496
Fixed in https://github.com/SteveMacenski/spatio_temporal_voxel_layer/pull/168
Weird. Is noetic otherwise working?
Yes, provided that this line is added to .bashrc export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2
I don't think it can be released until this issue is solved. It has apparently to do with the new jemalloc version in Focal: https://github.com/jemalloc/jemalloc/issues/1237
is that a pluginlib issue? Is there a ticket filed for that so someone knows its an issue?
I found nothing else. How could you tell if it is pluginlib or not related ?
You said you're having issues loading plugins, unless you're saying that this is a unique issue to STVL. I have to assume your issues are the result of pluginlib issues.
Sorry if I was unclear, the only thing I know for sure is that when I start move_base with STVL, i get this crash:
[ INFO] [/move_base]: local_costmap: Using plugin "obstacle_layer"
[ INFO] [/move_base]: Subscribed to Topics: laser_scan_sensor
[ INFO] [/move_base]: local_costmap: Using plugin "sonar_layer"
[ INFO] [/move_base]: local_costmap/sonar_layer: ALL as input_sensor_type given
[ INFO] [/move_base]: RangeSensorLayer: subscribed to topic /sonar
[ INFO] [/move_base]: local_costmap: Using plugin "rgbd_obstacle_layer"
terminate called after throwing an instance of 'pluginlib::LibraryLoadException'
what(): Failed to load library /home/gd/elodie1_ws/devel/lib//libspatio_temporal_voxel_layer.so. Make sure that you are calling the PLUGINLIB_EXPORT_CLASS macro in the library code, and that names are consistent between this macro and your XML. Error string: Could not load library (Poco exception = /lib/x86_64-linux-gnu/libjemalloc.so.2: cannot allocate memory in static TLS block)
whereas without it, I don't have this issue (and other plugins like costmap_2d::StaticLayer
, costmap_2d::ObstacleLayer
, range_sensor_layer::RangeSensorLayer
are loading fine).
Now, if it is a STVL or a pluginlib issue, I have no clue.
I dug a bit and found out that the issue disappears by using export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2
before launching move_base, that may help diagnostic the issue for somebody familiar with the dynamic library and memory allocation system (which I am not)
What's the computer you're trying to run it on? Can you verify on another hardware machine it works or doesn't? I'm not sure what to do about that myself.
I'd also be curious if you ran into issues loading NPVL or if its just STVL https://github.com/SteveMacenski/nonpersistent_voxel_layer
What's the computer you're trying to run it on? Can you verify on another hardware machine it works or doesn't? I'm not sure what to do about that myself.
Same result on my 3 years old asus laptop, a NUC8 and a NUC10.
I'd also be curious if you ran into issues loading NPVL or if its just STVL https://github.com/SteveMacenski/nonpersistent_voxel_layer
I ll try
Ok, so probably not platform specific.
It may (?) be released, but rosdistro doesn't have a Focal entry for openVDB https://github.com/ros/rosdistro/blob/master/rosdep/base.yaml#L3005
I'd also be curious if you ran into issues loading NPVL or if its just STVL https://github.com/SteveMacenski/nonpersistent_voxel_layer
I ll try
No issue with NPVL
Mhm, that is interesting then. I don't have a 20.04 machine yet to try to debug this. Hopefully in the next few weeks but for the moment, there's not much I can do.
Ok, so probably not platform specific.
It may (?) be released, but rosdistro doesn't have a Focal entry for openVDB https://github.com/ros/rosdistro/blob/master/rosdep/base.yaml#L3005
Mhm, that is interesting then. I don't have a 20.04 machine yet to try to debug this. Hopefully in the next few weeks but for the moment, there's not much I can do.
No rush for a release, I ll test it on a real robot hopefully in the next days and report back if I find any other issues
Ok, that issue is really odd though, I'd suggest filing a ticket for it on pluginlib repo since something broke during the last distribution update. As far as Ican tell, nothing should need to be changed http://wiki.ros.org/noetic/Migration
Ok, that issue is really odd though, I'd suggest filing a ticket for it on pluginlib repo since something broke during the last distribution update. As far as Ican tell, nothing should need to be changed http://wiki.ros.org/noetic/Migration
It doesn't seem like this issue is coming from pluginlib. The libjemalloc2
dependency comes from the Debian Package libopenvdb6.2
https://packages.debian.org/sid/libopenvdb6.2 https://packages.ubuntu.com/focal/libopenvdb6.2
@doisyg pointed to an issue suggesting jemalloc is not being built with --disable-initial-exec-tls
? It looks like there is a bug about this on the Debian bug tracker, so the Ubuntu Focal version probably has the same problem: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=951704
Commenting there, or opening another bug on launchpad might be good next steps: https://launchpad.net/ubuntu/+source/jemalloc/+bugs
Would this not be something to file with the openvdb maintainers on GitHub since they just need to change some flags and rebuild debians?
I'm not very encouraged by the likelihood of this being fixed given that ticket hasn't had motion since Feb.
I filed the ticket above on OpenVDB to make them aware. Is there any other action here I can take?
Thanks for filling the ticket, I reported there that the issue disappears when installing openvdb7.0 instead of openvdb6.2 before recompiling. The problem now being that openvdb7.0 is not officially available on focal
OK, mhm, we'll see what they say.
@doisyg any changes? I'm in another cycle of releases and STVL is ready to go when this issue is resolved. I haven't heard anything from the openvdb side, but I didn't know if they updated and since you've been OK
Still working fine for me with the LD_PRELOAD
workaround
https://github.com/AcademySoftwareFoundation/openvdb/issues/732#issuecomment-661294038
Can you verify that the binaries resolve this issue for your application?
Sure, as soon as I will notice an update on the Ubuntu packages, I will test and report. For now, it is still version 6.2.1-8ubuntu1 from 22 Feb 2020
As well
Note for ROS2 users: Nav2 tutorial includes this note to export. Hopefully we get new binaries soon, but I have to say I'm really unclear who releases the binaries for openvdb and how to get them to update them
The very same issue exists in other packages:
Fun. At least we're not alone on this front. It looks like openVDB fixed things in mainline, but I don't exactly know how to get new binaries released. Until then, Focal is basically just a dead operating system to me since I can't run openVDB which STVL and much of my R&D for nav2 environmental modeling is based on...
It looks like from @doisyg's comments that its reflected in v7.0 (is that accurate)? If so, I could update an openvdb on rosdistro to install 7.x if those binaries are available in Focal. Beyond that, I don't know how to continue.
It looks like from @doisyg's comments that its reflected in v7.0 (is that accurate)? If so, I could update an openvdb on rosdistro to install 7.x if those binaries are available in Focal. Beyond that, I don't know how to continue.
Yes, I confirm that the problem disappear when manually installing libopenvdb7.0 from groovy. But these binaries are not available on focal and my request of back-porting it has been refused, see here: https://answers.launchpad.net/ubuntu/+source/openvdb/+question/691239 "Backporting 7.0 release to focal is not option."
@doisyg: there is actually a good rationale for why "a backport is not an option" (from comment nr 5 here):
The Backports Project is a means to provide new features to users. Because of the inherent stability risks in backporting packages, users do not get backported packages without some explicit action on their part. This generally makes backports an inappropriate avenue for fixing bugs
this makes sense to me.
Fun. At least we're not alone on this front.
Interesting to see the problem affects rqt_image_view
too but that it doesn't use openvdb. This ties it up to the way pluggins are loaded and libjemalloc.
@doisyg: there is actually a good rationale for why "a backport is not an option" (from comment nr 5 here):
The Backports Project is a means to provide new features to users. Because of the inherent stability risks in backporting packages, users do not get backported packages without some explicit action on their part. This generally makes backports an inappropriate avenue for fixing bugs
this makes sense to me.
Yes, I understand the rational for not backporting, but I don't know how to trigger an SRU procedure that will use a fixed version of openvdb6.2 for focal (where this fixed version will be compiled from?)
I don't know how to trigger an SRU procedure
According to comment nr 3 (I can't seen to directly link to these comments):
If you run:
ubuntu-bug openvdb
This will start the process for you.
We've had a similar problem (getting an SRU in) with urdfdom_headers
(this one: https://github.com/ros/urdfdom_headers/issues/45).
Perhaps @j-rivero and/or @kyrofa can provide some insight into the SRU procedure.
Yes, I even reported a bug https://bugs.launchpad.net/ubuntu/+source/jemalloc/+bug/1882998 which is confirmed (and reported fixed on groovy with v7 but no backport possible). The thing is, I am not sure what sources ubuntu uses for the binaries of the v6.2, I guess these: https://github.com/AcademySoftwareFoundation/openvdb/tree/v6.2 but the problem is likely to still be present in these sources
I don't know how to trigger an SRU procedure that will use a fixed version of openvdb6.2 for focal (where this fixed version will be compiled from?)
Exactly. I don't care about 7.0 as much as I care that 6.x is fixed in the binaries. If its just build flags, its easy to resolve, just needs someone to physically do it and know who that person is to ask. I emailed someone on the list and never got a response. I don't know if they'd object to an update to 6.3 or something if some small code changes were required. At the moment, those binaries are basically useless to most users I have to think.
Hello Doisy:
Yes, I understand the rational for not backporting, but I don't know how to trigger an SRU procedure that will use a fixed version of openvdb6.2 for focal (where this fixed version will be compiled from?)
Speaking about how to add a source code patch to a .deb package this page can provide some hints: https://packaging.ubuntu.com/singlehtml/index.html#document-ubuntu-packaging-guide/patches-to-packages
For the whole SRU process, extensive documentation is available at: https://wiki.ubuntu.com/StableReleaseUpdates
The initial comment should look something like: https://bugs.launchpad.net/ubuntu/+source/urdfdom-headers/+bug/1817595/comments/6
Hello.
I am trying to install the spatio-temporal-voxel-layer package from source but it is gaving me problems.
Is going to be any release for noetic and the possibility to install it with apt-get?
If not, how do you recommend me to install it from source?
Best regards. Alessandro
Still waiting on a fix to the binaries for openVDB that this depends on.
Understood. Do you have a time estimation of how much time would it take?
https://github.com/SteveMacenski/spatio_temporal_voxel_layer#interesting-side-note A note was left about this for Foxy + newer and Noetic in the readme.
NOTE: If used on Ubuntu 20.04 (Foxy or Noetic), you must set your LD_PRELOAD path to include jemalloc due to a known compiler flag issue in the 20.04 binaries of OpenVDB (e.x. export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2). If you see the error: Could not load library LoadLibrary error: /usr/lib/x86_64-linux-gnu/libjemalloc.so.2: cannot allocate memory in static TLS block, this is your issue.
You could set this in the terminal, in your bashrc, or even in a launch file call if you liked. I considered just adding a system call in C++ to call this each time on startup, but I didn't know what that would do for non-Linux based OS.
This is not something in my direct control to fix, this is an issue with openVDB and this is a reasonable single OS solution to it. I wish they would update the binaries but its out of our control. This will be unlikely to be formally released for Noetic or Foxy in 20.04 due to this. But we have a work-able work around with no performance degradation.
You could set this in the terminal, in your bashrc, or even in a launch file call if you liked.
NOTE: setting it in .bashrc
caused my google chrome to give segfaults.
I think it is best to set in the launch file. A little help to others:
<launch>
<env name="LD_PRELOAD" value="/usr/lib/x86_64-linux-gnu/libjemalloc.so.2"/>
<node pkg="move_base" type="move_base" respawn="false" name="move_base" output="screen">
...
Builds fine under Noetic from branch melodic-devel but at execution I get:
Did somebody tried already and have a different result ?