fzi-forschungszentrum-informatik / gpu-voxels

GPU-Voxels is a CUDA based library which allows high resolution volumetric collision detection between animated 3D models and live pointclouds from 3D sensors of all kinds.
Other
302 stars 85 forks source link

OMPL Planning Example - Segmentation Fault Error #86

Closed AlejoDiaz49 closed 5 years ago

AlejoDiaz49 commented 5 years ago

Hello,

I was trying to use the _gvl_omplplanning example, but I received a segmentation fault error. I am working without ROS and with PCL 1.8.1. The only changes I made in the code were adding the robot using the Denavit-Hartenberg option instead the URDF one and adding to the CMakeLists find_package(Eigen3 required) because I had some problems with the location of this library.

I tried to debug but I wasn't able to find the problem.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fafda93d5c4 in boost::math::lanczos::lanczos17m64::lanczos_sum<long double> (z=@0x7fff5ceea1d0: 1) at /usr/include/boost/math/special_functions/lanczos.hpp:897
897           };

The backtrace of the program is

#0  0x00007fafda93d5c4 in boost::math::lanczos::lanczos17m64::lanczos_sum<long double> (z=@0x7fff5ceea1d0: 1) at /usr/include/boost/math/special_functions/lanczos.hpp:897
#1  0x00007fafda8a9c3c in boost::math::lanczos::lanczos_initializer<boost::math::lanczos::lanczos17m64, long double>::init::init (this=<optimized out>)
    at /usr/include/boost/math/special_functions/lanczos.hpp:49
#2  __static_initialization_and_destruction_0 (__initialize_p=1, __priority=65535)
    at /usr/include/boost/math/special_functions/lanczos.hpp:64
#3  _GLOBAL__sub_I_hdl_grabber.cpp(void) ()
    at /home/alejandro/VisionRepos/pcl/io/src/hdl_grabber.cpp:664
#4  0x00007fafdeac86ba in call_init (l=<optimized out>, argc=argc@entry=1, 
    argv=argv@entry=0x7fff5ceea288, env=env@entry=0x7fff5ceea298) at dl-init.c:72
#5  0x00007fafdeac87cb in call_init (env=0x7fff5ceea298, argv=0x7fff5ceea288, 
    argc=1, l=<optimized out>) at dl-init.c:30
#6  _dl_init (main_map=0x7fafdecdf168, argc=1, argv=0x7fff5ceea288, 
    env=0x7fff5ceea298) at dl-init.c:120
#7  0x00007fafdeab8c6a in _dl_start_user () from /lib64/ld-linux-x86-64.so.2
#8  0x0000000000000001 in ?? ()
#9  0x00007fff5ceea7f7 in ?? ()
#10 0x0000000000000000 in ?? ()

I also look into the memory but there is now new information

==29139== Memcheck, a memory error detector
==29139== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==29139== Using Valgrind-3.11.0 and LibVEX; rerun with -h for copyright info
==29139== Command: ./OMPLTest
==29139== Parent PID: 5894
==29139==
==29139==
==29139== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==29139==  Bad permissions for mapped region at address 0x5BFE6C8
==29139==    at 0x8BB95C4: long double boost::math::lanczos::lanczos17m64::lanczos_sum<long double>(long double const&) (in /usr/local/lib/libpcl_io.so.1.8.1)
==29139==    by 0x8B25C3B: init (lanczos.hpp:49)
==29139==    by 0x8B25C3B: __static_initialization_and_destruction_0 (lanczos.hpp:64)
==29139==    by 0x8B25C3B: _GLOBAL__sub_I_hdl_grabber.cpp (hdl_grabber.cpp:664)
==29139==    by 0x40106B9: call_init.part.0 (dl-init.c:72)
==29139==    by 0x40107CA: call_init (dl-init.c:30)
==29139==    by 0x40107CA: _dl_init (dl-init.c:120)
==29139==    by 0x4000C69: ??? (in /lib/x86_64-linux-gnu/ld-2.23.so)
==29139==
==29139== HEAP SUMMARY:
==29139==     in use at exit: 509,117 bytes in 1,849 blocks
==29139==   total heap usage: 4,150 allocs, 2,301 frees, 1,283,032,509 bytes allocated
==29139==
==29139== 287,535 bytes in 1 blocks are possibly lost in loss record 977 of 977
==29139==    at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==29139==    by 0xAA20962: ??? (in /usr/lib/nvidia-390/libGL.so.390.87)
==29139==    by 0x1070BCD0: ??? (in /usr/lib/nvidia-390/libnvidia-glcore.so.390.87)
==29139==    by 0x103774E0: ??? (in /usr/lib/nvidia-390/libnvidia-glcore.so.390.87)
==29139==    by 0xA9F0350: ??? (in /usr/lib/nvidia-390/libGL.so.390.87)
==29139==    by 0xA9F7928: ??? (in /usr/lib/nvidia-390/libGL.so.390.87)
==29139==    by 0xA9BD5DF: ??? (in /usr/lib/nvidia-390/libGL.so.390.87)
==29139==    by 0x4010679: call_init.part.0 (dl-init.c:58)
==29139==    by 0x40107CA: call_init (dl-init.c:30)
==29139==    by 0x40107CA: _dl_init (dl-init.c:120)
==29139==    by 0x4000C69: ??? (in /lib/x86_64-linux-gnu/ld-2.23.so)
==29139==
==29139== LEAK SUMMARY:
==29139==    definitely lost: 0 bytes in 0 blocks
==29139==    indirectly lost: 0 bytes in 0 blocks
==29139==      possibly lost: 287,535 bytes in 1 blocks
==29139==    still reachable: 221,582 bytes in 1,848 blocks
==29139==         suppressed: 0 bytes in 0 blocks
==29139== Reachable blocks (those to which a pointer was found) are not shown.
==29139== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==29139==
==29139== For counts of detected and suppressed errors, rerun with: -v
==29139== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 1 from 1)

Then I also found that the error appears or is generated on this line.

Do you have any clue about the source and a way to solve this problem

Thanks in advanced

cjue commented 5 years ago

Hi,

did you try the additional steps outlined in the gvl_ompl_planner README?

https://github.com/fzi-forschungszentrum-informatik/gpu-voxels/blob/master/gvl_ompl_planning/README.txt#L50

The following command allows you to check if there is still a PCL1.7 library somewhere in the mix at runtime: ldd gvl_ompl_planner | grep -F "libpcl"

AlejoDiaz49 commented 5 years ago

Hello Christian,

Thanks for the reply and yes, I try it and everything looks fine.

libpcl_common.so.1.8 => /usr/local/lib/libpcl_common.so.1.8 (0x00007fbac31cb000)
libpcl_io.so.1.8 => /usr/local/lib/libpcl_io.so.1.8 (0x00007fbac2d4c000)
libpcl_io_ply.so.1.8 => /usr/local/lib/libpcl_io_ply.so.1.8 (0x00007fbabf3a7000)
Squelsh commented 5 years ago

Hi Alejo,

if I remember it right, the problem originates from mixing C++11 and older code. See these links: https://github.com/PointCloudLibrary/pcl/issues/619#issuecomment-186854511 https://github.com/PointCloudLibrary/pcl/issues/1870 https://github.com/felixendres/rgbdslam_v2/issues/8

From your debug output I assume, that you have a PCL that you compiled from source? Perhaps you did not enable C++11 support there?

Apart from that, I have no idea how to debug this problem ):

Cheers, Andreas

AlejoDiaz49 commented 5 years ago

Hello,

Thanks for the reply Andreas and yes, I compiled the PCL library in the C++11 standard, so to try to solve the problem I followed one of the links you sent

The comment says that the problem was apparently fixed in Boost 1.6. I tested a few versions of Boost and PCL until everything kind of works with GPU-Voxels. I am going to write here the errors I got with the different versions and at the end the one that works.

First, I tried with with Boost 1.67 but PCL 1.8.1 wasn't compiling. A solution to this error was found here so I changed to the master branch, and then it compiled without errors. During the build process of GPU-Voxels more errors appeared, some of them were from the test. In this version of Boost BOOST_GLOBAL_FIXTURE is deprecated and it seems that more things are deprecated to. So I disable the tests in the compilation, but some other errors appear

Due to this I decide to change Boost to the version 1.62. PCL in the master branch works perfect, but the same problem of Global Fixture and other errors appeared when I tried to build GPU-Voxels. I disable again the tests and these errors disappear, but one about PCL comes out.

CMake Error in /usr/local/share/pcl-1.8/PCLConfig.cmake: cmake_policy PUSH without matching POP
Call Stack (most recent call first): packages/gpu_voxels/CMakeLists.txt:37 (FIND_PACKAGE) 

I changed PCL from the master to the 1.8.1 release and then I was able to build GPU-Voxels perfectly but without the tests. Another problem came when I launched the OMPL example, but this is not from GPU-Voxels

__CUDACC_VER__ is no longer supported. Use __CUDACC_VER__MAJOR__, __CUDACC_VER_MINOR__,
 and __CUDACC_VER_BUILD__ instead

So I changed __CUDACC_VER__ for __CUDACC_VER_BUILD__ in the file nvcc.hpp in the directory /usr/local/include/boost/config/compiler/. When then I tried the OMPL example again it worked, but the problem came the second time I launched. The visualizer stopped working, basically it tries to open but crashed immediately leaving this error

<2018-10-18 19:31:32.400> Visualization(Info)::main: Starting the gpu_voxels Visualizer.
<2018-10-18 19:31:32.400> Visualization(Warning) XMLInterpreter::getUnitScale: No unit scale defined. Using 1 cm instead.
<2018-10-18 19:31:32.515> Visualization(Info) Visualizer::initGL: Using GPU Device 0: "GeForce GTX 1050" with compute capability 6.1
<2018-10-18 19:31:32.544> Visualization(Info) Visualizer::initGL: Using OpenGL Version: 4.6.0 NVIDIA 390.87
<2018-10-18 19:31:32.546> Visualization(Info)::main: Number of voxel maps that will be drawn: 9
<2018-10-18 19:31:32.546> Visualization(Error)::registerVoxelmapFromSharedMemory: Couldn't find mem_segment voxel_map_handler_dev_pointer_0
<2018-10-18 19:31:32.546> Visualization(Error)::registerVoxelmapFromSharedMemory: Couldn't find mem_segment voxel_map_handler_dev_pointer_1
<2018-10-18 19:31:32.546> Visualization(Error)::registerVoxelmapFromSharedMemory: Couldn't find mem_segment voxel_map_handler_dev_pointer_2
<2018-10-18 19:31:32.546> Visualization(Error)::registerVoxelmapFromSharedMemory: Couldn't find mem_segment voxel_map_handler_dev_pointer_3
<2018-10-18 19:31:32.546> Visualization(Error)::registerVoxelmapFromSharedMemory: Couldn't find mem_segment voxel_map_handler_dev_pointer_4
<2018-10-18 19:31:32.546> Visualization(Error)::registerVoxelmapFromSharedMemory: Couldn't find mem_segment voxel_map_handler_dev_pointer_5
<2018-10-18 19:31:32.546> Visualization(Info)::registerVoxelmapFromSharedMemory: Providing a PROBAB_VOXELMAP called "myEnvironmentMap" with side_length 0.020000 and dimension [150, 150, 100]
<2018-10-18 19:31:32.547> Visualization(Warning) XMLInterpreter::getDataContext: Occupancy_threshold of /myEnvironmentMap is zero.
<2018-10-18 19:31:32.547> Visualization(Warning) XMLInterpreter::getDataContext: Occupancy_threshold of /voxelmap_6 is zero.
<2018-10-18 19:31:32.548> Visualization(Warning) Visualizer::registerVoxelMap: No context found for voxel map myEnvironmentMap. Using the default context.
<2018-10-18 19:31:32.549> Visualization(Info)::registerVoxelmapFromSharedMemory: Providing a PROBAB_VOXELMAP called "myRobotMap" with side_length 0.020000 and dimension [150, 150, 100]
<2018-10-18 19:31:32.549> Visualization(Warning) XMLInterpreter::getDataContext: Occupancy_threshold of /myRobotMap is zero.
<2018-10-18 19:31:32.550> Visualization(Warning) XMLInterpreter::getDataContext: Occupancy_threshold of /voxelmap_7 is zero.
<2018-10-18 19:31:32.550> Visualization(Warning) Visualizer::registerVoxelMap: No context found for voxel map myRobotMap. Using the default context.
<2018-10-18 19:31:32.551> Visualization(Info)::registerVoxelmapFromSharedMemory: Providing a PROBAB_VOXELMAP called "myQueryMap" with side_length 0.020000 and dimension [150, 150, 100]
<2018-10-18 19:31:32.551> Visualization(Warning) XMLInterpreter::getDataContext: Occupancy_threshold of /myQueryMap is zero.
<2018-10-18 19:31:32.552> Visualization(Warning) XMLInterpreter::getDataContext: Occupancy_threshold of /voxelmap_8 is zero.
<2018-10-18 19:31:32.552> Visualization(Warning) Visualizer::registerVoxelMap: No context found for voxel map myQueryMap. Using the default context.
<2018-10-18 19:31:32.553> Visualization(Info)::main: Number of voxel lists that will be drawn: 3
<2018-10-18 19:31:32.553> Visualization(Info)::registerVoxellistFromSharedMemory: Providing a voxellist called "mySolutionMap"
<2018-10-18 19:31:32.553> Visualization(Warning) XMLInterpreter::getDataContext: Occupancy_threshold of /mySolutionMap is zero.
<2018-10-18 19:31:32.554> Visualization(Warning) XMLInterpreter::getDataContext: Occupancy_threshold of /voxellist_0 is zero.
<2018-10-18 19:31:32.554> Visualization(Warning) Visualizer::registerVoxelList: No context found for voxel list mySolutionMap. Using the default context.
<2018-10-18 19:31:32.555> Visualization(Info)::registerVoxellistFromSharedMemory: Providing a voxellist called "mySolutionMap"
<2018-10-18 19:31:32.555> Visualization(Warning) XMLInterpreter::getDataContext: Occupancy_threshold of /mySolutionMap is zero.
<2018-10-18 19:31:32.555> Visualization(Warning) XMLInterpreter::getDataContext: Occupancy_threshold of /voxellist_1 is zero.
<2018-10-18 19:31:32.556> Visualization(Warning) Visualizer::registerVoxelList: No context found for voxel list mySolutionMap. Using the default context.
<2018-10-18 19:31:32.556> Visualization(Info)::registerVoxellistFromSharedMemory: Providing a voxellist called "mySolutionMap"
<2018-10-18 19:31:32.557> Visualization(Warning) XMLInterpreter::getDataContext: Occupancy_threshold of /mySolutionMap is zero.
<2018-10-18 19:31:32.557> Visualization(Warning) XMLInterpreter::getDataContext: Occupancy_threshold of /voxellist_2 is zero.
<2018-10-18 19:31:32.558> Visualization(Warning) Visualizer::registerVoxelList: No context found for voxel list mySolutionMap. Using the default context.
<2018-10-18 19:31:32.558> Visualization(Info)::main: Number of octrees that will be drawn: 0
<2018-10-18 19:31:32.558> Visualization(Info)::main: Number of Primitive Arrays that will be drawn: 0
terminate called after throwing an instance of 'thrust::system::system_error'
  what():  parallel_for failed: unknown error
Abortado (`core' generado)

I thought it was a process that was still open or something like it, so I restarted the computer and tried again. It worked the first time and the second time the visualizer crashed, but I was not able to found any process that could cause this. I decided to try to solve the problems of the tests and check if these were the reasons of the problem. The first one was

/gpu-voxels/packages/icl_core/src/ts/icl_core_config/ts_main.cpp:45:1: error: 
expected initializer at end of input  BOOST_GLOBAL_FIXTURE(GlobalFixture)

I checked the boost tutorial for the version 1.62 and noticed that at the end of BOOST_GLOBAL_FIXTURE(GlobalFixture) the ; is now required. Adding it to this two lines (icl_core_config and icl_core_thread) was enough.

The next two errors were:

gpu-voxels/packages/icl_core/src/ts/icl_core_thread/ts_PeriodicThread.cpp:186:71: error:
invalid operands of types ‘const char [15]’ and ‘int64_t {aka long int}’ to binary ‘operator<<’
BOOST_MESSAGE("max deviation=" << test_thread.maxDeviation().toNSec() << "ns" <<`
gpu-voxels/packages/icl_core/src/ts/icl_core_thread/ts_PeriodicThread.cpp:188:84: error:
‘BOOST_MESSAGE’ was not declared in this scope ", mean deviation=" <<
test_thread.meanDeviation().toNSec() << "ns");`

The problem is that BOOST_MESSAGE is deprecated and know we have to use BOOST_TEST_MESSAGE in this line

These changes let me build everything but two warnings appeared. For the moment I just ignoring them

/gpu-voxels/packages/gpu_voxels/src/gpu_voxels/test/testing_fixtures.hpp(43): warning:
overloaded virtual function "boost::unit_test::test_tree_visitor::visit" is only partially 
overridden in class "Visitor"
gpu-voxels/packages/gpu_voxels/src/gpu_voxels/test/testing_cudaMath.cu(140): warning: 
controlling expression is constant

I launched _test_gpu_voxelscore and there were no errors, but the problem I still have is that the visualizer keeps crashing the second time i run it with the OMPL example. Do you have any idea about it?

Thanks in advanced and sorry for the length of this comment Alejandro

Squelsh commented 5 years ago

Hi Alejandro,

I am sorry that you have a hard time to get this running. Most of the changes you mentioned are also mentioned here: https://github.com/Squelsh/gpu-voxels/commit/45f44baa2fd8674961ab300e9e490c7e0550d3fc I encountered the problems when compiling under 18.04.

The problem with the Visualizer is most probably the buggy cleanup of the Shared-Memory file descriptors. Please check, if you can fix it, by deleting /dev/shm/*SharedMemory files. This has to be done, when no main program and no visualizer is running! Afterwards start the main program again and then the visualizer.

Please tell me, if this helped. Cheers, Andreas

cjue commented 5 years ago

Hi,

thanks for your reports. I will look into improving the visualizer fault tolerance - one simple fix would be to have the visualizer delete the corrupted file in case of a crash.

I am not sure what sequence of events causes that corruption though.

Regarding 18.04: I will look into better compatibility there too.

AlejoDiaz49 commented 5 years ago

Hello Andreas and Christian,

I didn't see that commit, but it makes sense that you had the same problems since with Ubuntu 18.04 the boost version that is installed with sudo apt-get install libboost-all-dev is the 1.65.

You are right, by deleting those files the visualizer keeps working after the first launch. If I find a good solution i'll put it here. In the meantime, more or less as Christian said I made a simple Shell Script that runs the main program, the visualizer and deletes those files

Thanks Alejandro

AlejoDiaz49 commented 5 years ago

Hello,

I'm having other issues. The main one is that the collision detector is not working properly. Even if in the visualizer the collision is obvious the detector is not always getting them. The program is the same as the one in the gut, and as I said in the first comment the only important thing that I think I change is that the robot is introduced with the Denavit-Hartenberg parameters. Is there anything that is not well configured? Because in other programs I have works perfectly.

And also I found two scenarios in which the program crash. The first one is that after launching the program several times, the main program crash showing the message:

VoxellistLog(Error)::make_unique: GetLastError: 0 VoxellistLog(Error) TemplateVoxelList::make_unique: Caught Thrust exception while sorting: radix_sort: failed to get memory buffer: out of memory

After restarting the PC this message doesn't appear.

The other error is with the visualizer that when I select a voxel from the QueueMap, gives a segmentation error.

Since is quite difficult to show everything just writing. Here you have a video with two scenarios in which you can see both error and also how the collision detector is not detecting all the collisions. At the end the solution collides with the environment.

Thanks in advanced and sorry for all the questions.

Squelsh commented 5 years ago

Your video link does not work. If stuff is not confidential, feel free to upload your code and I can have a look or build it on my machine. If it is more or less the example code, it should not have additional dependencies?

About the other problems: No clue, without seeing the code...

AlejoDiaz49 commented 5 years ago

Hello,

Sorry, I already fix the link of the video. The code is very similar to yours so there is no problem. In the CMakeFile I only added Eigen3 and a required version of OMPL because it had problems founding the packages.

Here you have all the files PathPlanner.zip

Thanks

AlejoDiaz49 commented 5 years ago

Any idea about why is not detecting all the collisions?

cjue commented 5 years ago

Hi Alejandro,

right away I have to admit that we always use URDF robots for our current projects, so there might well be some unknown bugs relating to the DH robots. There might also be yet unknown 18.04-related issue in there.

I will look into your example to try to reproduce the mentioned errors.

cjue commented 5 years ago

Hi Alejandro,

I hope this reaches you in time:

The reason for the undetected collisions is in the implicit coll_threshold parameter of the collideWith function. The default value for this is 1.0f, which would register probabilistic voxels with maximum occupancy probability as occupied. This is a bad assumption in this situation, where the obstacles have a occupancy of about 80%.

With a threshold of 0.7f or lower the code works as expected:

diff --git a/gvl_ompl_planning/gvl_ompl_planner_helper.cpp b/gvl_ompl_planning/gvl_ompl_planner_helper.cpp
index 9204017..7ddee34 100644
--- a/gvl_ompl_planning/gvl_ompl_planner_helper.cpp
+++ b/gvl_ompl_planning/gvl_ompl_planner_helper.cpp
@@ -193,7 +193,7 @@ bool GvlOmplPlannerHelper::isValid(const ompl::base::State *state) const
     PERF_MON_SILENT_MEASURE_AND_RESET_INFO_P("insert", "Pose Insertion", "pose_check");

     PERF_MON_START("coll_test");
-    size_t num_colls_pc = gvl->getMap("myRobotMap")->as<voxelmap::ProbVoxelMap>()->collideWith(gvl->getMap("myEnvironmentMap")->as<voxelmap::ProbVoxelMap>());
+    size_t num_colls_pc = gvl->getMap("myRobotMap")->as<voxelmap::ProbVoxelMap>()->collideWith(gvl->getMap("myEnvironmentMap")->as<voxelmap::ProbVoxelMap>(), 0.7f);
     PERF_MON_SILENT_MEASURE_AND_RESET_INFO_P("coll_test", "Pose Collsion", "pose_check");

     //std::cout << "Validity check on state ["  << values[0] << ", " << values[1] << ", " << values[2] << ", " << values[3] << ", " << values[4] << ", " << values[5] << "] resulting in " <<  num_colls_pc << " colls." << std::endl;
@@ -324,7 +324,7 @@ bool GvlOmplPlannerHelper::checkMotion(const ompl::base::State *s1, const ompl::

         //gvl->visualizeMap("myRobotMap");
         PERF_MON_START("coll_test");
-        size_t num_colls_pc = gvl->getMap("myRobotMap")->as<voxelmap::ProbVoxelMap>()->collideWith(gvl->getMap("myEnvironmentMap")->as<voxelmap::ProbVoxelMap>());
+        size_t num_colls_pc = gvl->getMap("myRobotMap")->as<voxelmap::ProbVoxelMap>()->collideWith(gvl->getMap("myEnvironmentMap")->as<voxelmap::ProbVoxelMap>(), 0.7f);
         //std::cout << "CheckMotion1 for " << nd << " segments. Resulting in " << num_colls_pc << " colls." << std::endl;
         PERF_MON_SILENT_MEASURE_AND_RESET_INFO_P("coll_test", "Pose Collsion", "motion_check");
cjue commented 5 years ago

The fix for gvl_ompl_planner is in v1.3.0.