SteveMacenski / spatio_temporal_voxel_layer

A new voxel layer leveraging modern 3D graphics tools to modernize navigation environmental representations
http://wiki.ros.org/spatio_temporal_voxel_layer
GNU Lesser General Public License v2.1
623 stars 184 forks source link

Galactic release #167

Closed doisyg closed 2 years ago

doisyg commented 4 years ago

Builds fine under Noetic from branch melodic-devel but at execution I get:

terminate called after throwing an instance of 'pluginlib::LibraryLoadException' what(): Failed to load library /home/ws/devel/lib//libspatio_temporal_voxel_layer.so. Make sure that you are calling the PLUGINLIB_EXPORT_CLASS macro in the library code, and that names are consistent between this macro and your XML. Error string: Could not load library (Poco exception = /lib/x86_64-linux-gnu/libjemalloc.so.2: cannot allocate memory in static TLS block)

Did somebody tried already and have a different result ?

SteveMacenski commented 4 years ago

I'm not seeing anything here that makes me suspect of pluginlib http://wiki.ros.org/noetic/Migration. The melodic devel branch looks identical to the noetic devel branch (https://github.com/ros/pluginlib).

Does the rospack get plugins call detect it properly?

I'd be more than happy to release it once we work through these issues.

doisyg commented 4 years ago

Yes:

$ rospack plugins --attrib=plugin costmap_2d 
spatio_temporal_voxel_layer /ws/src/spatio_temporal_voxel_layer/costmap_plugins.xml
costmap_2d /opt/ros/noetic/share/costmap_2d/costmap_plugins.xml

What's strange is that when preloading jemalloc with LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2, the error disappears, but then it causes move_base to crash. I will keep you posted if I have time to investigate. Crash log (with LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2 before roslaunch):

[ WARN] [/move_base]: global_costmap: Pre-Hydro parameter "static_map" unused since "plugins" is provided
[ WARN] [/move_base]: global_costmap: Pre-Hydro parameter "map_type" unused since "plugins" is provided
[ INFO] [/move_base]: global_costmap: Using plugin "static_layer"
[ INFO] [/move_base]: Requesting the map...
[ INFO] [/move_base]: Resizing costmap to 200 X 200 at 0.050000 m/pix
[ INFO] [/move_base]: Received a 200 X 200 map at 0.050000 m/pix
[ INFO] [/move_base]: global_costmap: Using plugin "obstacle_layer"
[ INFO] [/move_base]:     Subscribed to Topics: laser_scan_sensor
[ INFO] [/move_base]: global_costmap: Using plugin "inflation_layer"
[ WARN] [/move_base]: local_costmap: Pre-Hydro parameter "static_map" unused since "plugins" is provided
[ WARN] [/move_base]: local_costmap: Pre-Hydro parameter "map_type" unused since "plugins" is provided
[ INFO] [/move_base]: local_costmap: Using plugin "static_layer"
[ INFO] [/move_base]: Requesting the map...
[ INFO] [/move_base]: Resizing static layer to 200 X 200 at 0.050000 m/pix
[ INFO] [/move_base]: Received a 200 X 200 map at 0.050000 m/pix
[ INFO] [/move_base]: local_costmap: Using plugin "obstacle_layer"
[ INFO] [/move_base]:     Subscribed to Topics: laser_scan_sensor
[ INFO] [/move_base]: local_costmap: Using plugin "rgbd_obstacle_layer"
[ INFO] [/move_base]: local_costmap/rgbd_obstacle_layer being initialized as SpatioTemporalVoxelLayer!
[ INFO] [/move_base]: local_costmap/rgbd_obstacle_layer's global frame is map.
[ INFO] [/move_base]: local_costmap/rgbd_obstacle_layer loaded parameters from parameter server.
[ INFO] [/move_base]: local_costmap/rgbd_obstacle_layer created underlying voxel grid.
[ INFO] [/move_base]: local_costmap/rgbd_obstacle_layer initialization complete!
[move_base-3] process has died [pid 402449, exit code -11, cmd /opt/ros/noetic/lib/move_base/move_base __name:=move_base __log:=/home/gd/.ros/log/50b524f2-a0bf-11ea-9306-9d8c2f1ba3e5/move_base-3.log].
log file: /home/gd/.ros/log/50b524f2-a0bf-11ea-9306-9d8c2f1ba3e5/move_base-3*.log
SteveMacenski commented 4 years ago

Running gdb to know where the crash happens would be useful. Might even be on movebase side if it crashes after the “complete” message

doisyg commented 4 years ago

I assume you speak about the second issue, it looks like it is from the updateFootprint function:

[ INFO] [/move_base]: local_costmap/rgbd_obstacle_layer created underlying voxel grid.
[ INFO] [/move_base]: local_costmap/rgbd_obstacle_layer initialization complete!
[New Thread 0x7fffe65ce700 (LWP 69227)]
--Type <RET> for more, q to quit, c to continue without paging--

Thread 11 "move_base" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe65ce700 (LWP 69227)]
0x00007ffff1eaba97 in spatio_temporal_voxel_layer::SpatioTemporalVoxelLayer::updateFootprint(double, double, double, double*, double*, double*, double*) () from /ws/devel/lib//libspatio_temporal_voxel_layer.so

However I don't know if this second issue is linked or not to the first one and its workaround (preloading jemalloc)

doisyg commented 4 years ago

If started with stvl param enabled: false or update_footprint_enabled : false (and still preloading jemalloc), it doesn't crash. Then, at runtime (changing dyn param): If enabled: true and update_footprint_enabled : false, no crash Then if enabled: true and update_footprint_enabled : true => crash

SteveMacenski commented 4 years ago

What was the crash from it you had GDB up? What was the traceback? Maybe your footprint parameter wasn't read in correctly so one of the double pointers was null?

SteveMacenski commented 4 years ago

It would be good to know where the error is in this function https://github.com/SteveMacenski/spatio_temporal_voxel_layer/blob/melodic-devel/src/spatio_temporal_voxel_layer.cpp#L477-L496

(since some of them are costmap calls, then maybe actually an error in costmap 2d. In fact, all those functions inside of this are provided by costmap 2d)

doisyg commented 4 years ago

I am not fully understanding why, but because the updateFootprint function has no return value, the for loop was running way above _transformed_footprint.size() and overflowing https://github.com/SteveMacenski/spatio_temporal_voxel_layer/blob/67a6e4d93e71168a2da068c374c96bdc83c29e4a/src/spatio_temporal_voxel_layer.cpp#L477-L496 Fixed in https://github.com/SteveMacenski/spatio_temporal_voxel_layer/pull/168

SteveMacenski commented 4 years ago

Weird. Is noetic otherwise working?

doisyg commented 4 years ago

Yes, provided that this line is added to .bashrc export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2 I don't think it can be released until this issue is solved. It has apparently to do with the new jemalloc version in Focal: https://github.com/jemalloc/jemalloc/issues/1237

SteveMacenski commented 4 years ago

is that a pluginlib issue? Is there a ticket filed for that so someone knows its an issue?

doisyg commented 4 years ago

I found nothing else. How could you tell if it is pluginlib or not related ?

SteveMacenski commented 4 years ago

You said you're having issues loading plugins, unless you're saying that this is a unique issue to STVL. I have to assume your issues are the result of pluginlib issues.

doisyg commented 4 years ago

Sorry if I was unclear, the only thing I know for sure is that when I start move_base with STVL, i get this crash:

[ INFO] [/move_base]: local_costmap: Using plugin "obstacle_layer"
[ INFO] [/move_base]:     Subscribed to Topics: laser_scan_sensor
[ INFO] [/move_base]: local_costmap: Using plugin "sonar_layer"
[ INFO] [/move_base]: local_costmap/sonar_layer: ALL as input_sensor_type given
[ INFO] [/move_base]: RangeSensorLayer: subscribed to topic /sonar
[ INFO] [/move_base]: local_costmap: Using plugin "rgbd_obstacle_layer"
terminate called after throwing an instance of 'pluginlib::LibraryLoadException'
  what():  Failed to load library /home/gd/elodie1_ws/devel/lib//libspatio_temporal_voxel_layer.so. Make sure that you are calling the PLUGINLIB_EXPORT_CLASS macro in the library code, and that names are consistent between this macro and your XML. Error string: Could not load library (Poco exception = /lib/x86_64-linux-gnu/libjemalloc.so.2: cannot allocate memory in static TLS block)

whereas without it, I don't have this issue (and other plugins like costmap_2d::StaticLayer, costmap_2d::ObstacleLayer, range_sensor_layer::RangeSensorLayer are loading fine). Now, if it is a STVL or a pluginlib issue, I have no clue. I dug a bit and found out that the issue disappears by using export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2 before launching move_base, that may help diagnostic the issue for somebody familiar with the dynamic library and memory allocation system (which I am not)

SteveMacenski commented 4 years ago

What's the computer you're trying to run it on? Can you verify on another hardware machine it works or doesn't? I'm not sure what to do about that myself.

SteveMacenski commented 4 years ago

I'd also be curious if you ran into issues loading NPVL or if its just STVL https://github.com/SteveMacenski/nonpersistent_voxel_layer

doisyg commented 4 years ago

What's the computer you're trying to run it on? Can you verify on another hardware machine it works or doesn't? I'm not sure what to do about that myself.

Same result on my 3 years old asus laptop, a NUC8 and a NUC10.

doisyg commented 4 years ago

I'd also be curious if you ran into issues loading NPVL or if its just STVL https://github.com/SteveMacenski/nonpersistent_voxel_layer

I ll try

SteveMacenski commented 4 years ago

Ok, so probably not platform specific.

It may (?) be released, but rosdistro doesn't have a Focal entry for openVDB https://github.com/ros/rosdistro/blob/master/rosdep/base.yaml#L3005

doisyg commented 4 years ago

I'd also be curious if you ran into issues loading NPVL or if its just STVL https://github.com/SteveMacenski/nonpersistent_voxel_layer

I ll try

No issue with NPVL

SteveMacenski commented 4 years ago

Mhm, that is interesting then. I don't have a 20.04 machine yet to try to debug this. Hopefully in the next few weeks but for the moment, there's not much I can do.

doisyg commented 4 years ago

Ok, so probably not platform specific.

It may (?) be released, but rosdistro doesn't have a Focal entry for openVDB https://github.com/ros/rosdistro/blob/master/rosdep/base.yaml#L3005

https://github.com/ros/rosdistro/pull/25257

doisyg commented 4 years ago

Mhm, that is interesting then. I don't have a 20.04 machine yet to try to debug this. Hopefully in the next few weeks but for the moment, there's not much I can do.

No rush for a release, I ll test it on a real robot hopefully in the next days and report back if I find any other issues

SteveMacenski commented 4 years ago

Ok, that issue is really odd though, I'd suggest filing a ticket for it on pluginlib repo since something broke during the last distribution update. As far as Ican tell, nothing should need to be changed http://wiki.ros.org/noetic/Migration

sloretz commented 4 years ago

Ok, that issue is really odd though, I'd suggest filing a ticket for it on pluginlib repo since something broke during the last distribution update. As far as Ican tell, nothing should need to be changed http://wiki.ros.org/noetic/Migration

It doesn't seem like this issue is coming from pluginlib. The libjemalloc2 dependency comes from the Debian Package libopenvdb6.2

https://github.com/SteveMacenski/spatio_temporal_voxel_layer/blob/e077795c89a280ffffa83db61af540464e34a892/package.xml#L44-L45

https://packages.debian.org/sid/libopenvdb6.2 https://packages.ubuntu.com/focal/libopenvdb6.2

@doisyg pointed to an issue suggesting jemalloc is not being built with --disable-initial-exec-tls? It looks like there is a bug about this on the Debian bug tracker, so the Ubuntu Focal version probably has the same problem: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=951704

Commenting there, or opening another bug on launchpad might be good next steps: https://launchpad.net/ubuntu/+source/jemalloc/+bugs

SteveMacenski commented 4 years ago

Would this not be something to file with the openvdb maintainers on GitHub since they just need to change some flags and rebuild debians?

I'm not very encouraged by the likelihood of this being fixed given that ticket hasn't had motion since Feb.

SteveMacenski commented 4 years ago

I filed the ticket above on OpenVDB to make them aware. Is there any other action here I can take?

doisyg commented 4 years ago

Thanks for filling the ticket, I reported there that the issue disappears when installing openvdb7.0 instead of openvdb6.2 before recompiling. The problem now being that openvdb7.0 is not officially available on focal

SteveMacenski commented 4 years ago

OK, mhm, we'll see what they say.

SteveMacenski commented 4 years ago

@doisyg any changes? I'm in another cycle of releases and STVL is ready to go when this issue is resolved. I haven't heard anything from the openvdb side, but I didn't know if they updated and since you've been OK

doisyg commented 4 years ago

Still working fine for me with the LD_PRELOAD workaround

SteveMacenski commented 4 years ago

https://github.com/AcademySoftwareFoundation/openvdb/issues/732#issuecomment-661294038

Can you verify that the binaries resolve this issue for your application?

doisyg commented 4 years ago

Sure, as soon as I will notice an update on the Ubuntu packages, I will test and report. For now, it is still version 6.2.1-8ubuntu1 from 22 Feb 2020

SteveMacenski commented 4 years ago

As well

SteveMacenski commented 4 years ago

Note for ROS2 users: Nav2 tutorial includes this note to export. Hopefully we get new binaries soon, but I have to say I'm really unclear who releases the binaries for openvdb and how to get them to update them

dirk-thomas commented 4 years ago

The very same issue exists in other packages:

SteveMacenski commented 4 years ago

Fun. At least we're not alone on this front. It looks like openVDB fixed things in mainline, but I don't exactly know how to get new binaries released. Until then, Focal is basically just a dead operating system to me since I can't run openVDB which STVL and much of my R&D for nav2 environmental modeling is based on...

It looks like from @doisyg's comments that its reflected in v7.0 (is that accurate)? If so, I could update an openvdb on rosdistro to install 7.x if those binaries are available in Focal. Beyond that, I don't know how to continue.

doisyg commented 4 years ago

It looks like from @doisyg's comments that its reflected in v7.0 (is that accurate)? If so, I could update an openvdb on rosdistro to install 7.x if those binaries are available in Focal. Beyond that, I don't know how to continue.

Yes, I confirm that the problem disappear when manually installing libopenvdb7.0 from groovy. But these binaries are not available on focal and my request of back-porting it has been refused, see here: https://answers.launchpad.net/ubuntu/+source/openvdb/+question/691239 "Backporting 7.0 release to focal is not option."

gavanderhoorn commented 4 years ago

@doisyg: there is actually a good rationale for why "a backport is not an option" (from comment nr 5 here):

The Backports Project is a means to provide new features to users. Because of the inherent stability risks in backporting packages, users do not get backported packages without some explicit action on their part. This generally makes backports an inappropriate avenue for fixing bugs

this makes sense to me.

doisyg commented 4 years ago

Fun. At least we're not alone on this front.

Interesting to see the problem affects rqt_image_view too but that it doesn't use openvdb. This ties it up to the way pluggins are loaded and libjemalloc.

doisyg commented 4 years ago

@doisyg: there is actually a good rationale for why "a backport is not an option" (from comment nr 5 here):

The Backports Project is a means to provide new features to users. Because of the inherent stability risks in backporting packages, users do not get backported packages without some explicit action on their part. This generally makes backports an inappropriate avenue for fixing bugs

this makes sense to me.

Yes, I understand the rational for not backporting, but I don't know how to trigger an SRU procedure that will use a fixed version of openvdb6.2 for focal (where this fixed version will be compiled from?)

gavanderhoorn commented 4 years ago

I don't know how to trigger an SRU procedure

According to comment nr 3 (I can't seen to directly link to these comments):

If you run: ubuntu-bug openvdb

This will start the process for you.

We've had a similar problem (getting an SRU in) with urdfdom_headers (this one: https://github.com/ros/urdfdom_headers/issues/45).

Perhaps @j-rivero and/or @kyrofa can provide some insight into the SRU procedure.

doisyg commented 4 years ago

Yes, I even reported a bug https://bugs.launchpad.net/ubuntu/+source/jemalloc/+bug/1882998 which is confirmed (and reported fixed on groovy with v7 but no backport possible). The thing is, I am not sure what sources ubuntu uses for the binaries of the v6.2, I guess these: https://github.com/AcademySoftwareFoundation/openvdb/tree/v6.2 but the problem is likely to still be present in these sources

SteveMacenski commented 4 years ago

I don't know how to trigger an SRU procedure that will use a fixed version of openvdb6.2 for focal (where this fixed version will be compiled from?)

Exactly. I don't care about 7.0 as much as I care that 6.x is fixed in the binaries. If its just build flags, its easy to resolve, just needs someone to physically do it and know who that person is to ask. I emailed someone on the list and never got a response. I don't know if they'd object to an update to 6.3 or something if some small code changes were required. At the moment, those binaries are basically useless to most users I have to think.

j-rivero commented 4 years ago

Hello Doisy:

Yes, I understand the rational for not backporting, but I don't know how to trigger an SRU procedure that will use a fixed version of openvdb6.2 for focal (where this fixed version will be compiled from?)

Speaking about how to add a source code patch to a .deb package this page can provide some hints: https://packaging.ubuntu.com/singlehtml/index.html#document-ubuntu-packaging-guide/patches-to-packages

For the whole SRU process, extensive documentation is available at: https://wiki.ubuntu.com/StableReleaseUpdates

The initial comment should look something like: https://bugs.launchpad.net/ubuntu/+source/urdfdom-headers/+bug/1817595/comments/6

AlessandroMelino commented 3 years ago

Hello.

I am trying to install the spatio-temporal-voxel-layer package from source but it is gaving me problems.

Is going to be any release for noetic and the possibility to install it with apt-get?

If not, how do you recommend me to install it from source?

Best regards. Alessandro

SteveMacenski commented 3 years ago

Still waiting on a fix to the binaries for openVDB that this depends on.

AlessandroMelino commented 3 years ago

Understood. Do you have a time estimation of how much time would it take?

SteveMacenski commented 3 years ago

https://github.com/SteveMacenski/spatio_temporal_voxel_layer#interesting-side-note A note was left about this for Foxy + newer and Noetic in the readme.

NOTE: If used on Ubuntu 20.04 (Foxy or Noetic), you must set your LD_PRELOAD path to include jemalloc due to a known compiler flag issue in the 20.04 binaries of OpenVDB (e.x. export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libjemalloc.so.2). If you see the error: Could not load library LoadLibrary error: /usr/lib/x86_64-linux-gnu/libjemalloc.so.2: cannot allocate memory in static TLS block, this is your issue.

You could set this in the terminal, in your bashrc, or even in a launch file call if you liked. I considered just adding a system call in C++ to call this each time on startup, but I didn't know what that would do for non-Linux based OS.

This is not something in my direct control to fix, this is an issue with openVDB and this is a reasonable single OS solution to it. I wish they would update the binaries but its out of our control. This will be unlikely to be formally released for Noetic or Foxy in 20.04 due to this. But we have a work-able work around with no performance degradation.

cosmicog commented 3 years ago

You could set this in the terminal, in your bashrc, or even in a launch file call if you liked.

NOTE: setting it in .bashrc caused my google chrome to give segfaults. I think it is best to set in the launch file. A little help to others:

 <launch>
    <env name="LD_PRELOAD" value="/usr/lib/x86_64-linux-gnu/libjemalloc.so.2"/>
    <node pkg="move_base" type="move_base" respawn="false" name="move_base" output="screen">
...