NVIDIA-AI-IOT / ros2_tao_pointpillars

ROS2 node for 3D object detection using TAO-PointPillars.
Apache License 2.0
70 stars 22 forks source link

Documentation missing on how to set CUDA_VERSION for building or install CUDA #2

Open RFRIEDM-Trimble opened 1 year ago

RFRIEDM-Trimble commented 1 year ago

Since cmake 3.8, it appears to be recommended to enable the CUDA language [ref]. When I try building the project as is, I get the following error:

CMake Error at /root/cmake-3.23.1/install/share/cmake-3.23/Modules/FindCUDA.cmake:859 (message):
  Specify CUDA_TOOLKIT_ROOT_DIR
Call Stack (most recent call first):
  CMakeLists.txt:46 (find_package)

Yes, the cmake version for this is 3.5, however it can be bumped to 3.8 minimum, or even 3.12.2 and still support all tier 1 platforms in Foxy [ref].

As a ROS user, but someone not familiar with Cuda, I would expect all dependencies to be installed with rosdep. Taking a brief look, it appears there are keys for it.

Thus, I propose the following: 1) Add the rosdep source nvidia-cuda to the package.xml file in this repo. 2) Add README instructions for using rosdep to install dependencies. 3) Modify the cmakelists PROJECT command to set languages to enable CUDA. 4) Remove the hard coded version of cuda here set(CUDA_VERSION 11.3), instead letting cmake determine the version. 5) Perform any other modifications to the CMakeLists such that CUDA is brought in correctly as a dependency and properly linked.

RFRIEDM-Trimble commented 1 year ago

I did try some of this out, and ran into a few issues caused by incompatibilities in Ubuntu 20.04.

  1. When installing nvidia-cuda-toolkit with default apt sources, it installs nvcc 10.1. The default gcc version in ubuntu is 9.4.0. The compatibility charts for nvcc as well as compilation errors say at most gcc 8 is supported.
  2. nvidia-cuda-toolkit installed in apt does appear to declare a dependency on gcc<=8, but it's not used because it's not default.

To fix this, you can set the compiler version when compiling, but that still doesn't fix the error:

# colcon build --packages-up-to pp_infer --cmake-args -DCMAKE_CXX_COMPILER=gcc-8
Starting >>> pp_infer
--- stderr: pp_infer                         
CMake Error at /root/cmake-3.23.1/install/share/cmake-3.23/Modules/CMakeDetermineCUDACompiler.cmake:633 (message):
  Failed to detect a default CUDA architecture.

After upgrading to cmake 3.24 to get a better error message, the following is output

No specification of gcc/g++

root@rfriedm-us-dl01:~/tas-ros2-system# colcon build --packages-up-to pp_infer
Starting >>> pp_infer
--- stderr: pp_infer                         
CMake Error at /usr/share/cmake-3.16/Modules/CMakeTestCUDACompiler.cmake:46 (message):
  The CUDA compiler

    "/usr/bin/nvcc"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: /root/tas-ros2-system/build/pp_infer/CMakeFiles/CMakeTmp

    Run Build Command(s):/usr/bin/make cmTC_b9953/fast && /usr/bin/make -f CMakeFiles/cmTC_b9953.dir/build.make CMakeFiles/cmTC_b9953.dir/build
    make[1]: Entering directory '/root/tas-ros2-system/build/pp_infer/CMakeFiles/CMakeTmp'
    Building CUDA object CMakeFiles/cmTC_b9953.dir/main.cu.o
    /usr/bin/nvcc     -x cu -c /root/tas-ros2-system/build/pp_infer/CMakeFiles/CMakeTmp/main.cu -o CMakeFiles/cmTC_b9953.dir/main.cu.o
    ERROR: No supported gcc/g++ host compiler found, but clang-6.0 is available.
           Use 'nvcc -ccbin clang-6.0' to use that instead.
    make[1]: *** [CMakeFiles/cmTC_b9953.dir/build.make:66: CMakeFiles/cmTC_b9953.dir/main.cu.o] Error 1
    make[1]: Leaving directory '/root/tas-ros2-system/build/pp_infer/CMakeFiles/CMakeTmp'
    make: *** [Makefile:121: cmTC_b9953/fast] Error 2

  CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
  CMakeLists.txt:17 (project)

---
Failed   <<< pp_infer [0.85s, exited with code 1]

Try setting gcc version with cmake argument

CMake Error at /usr/share/cmake-3.16/Modules/CMakeTestCXXCompiler.cmake:53 (message):
  The C++ compiler

    "/usr/bin/gcc-8"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: /root/tas-ros2-system/build/pp_infer/CMakeFiles/CMakeTmp

    Run Build Command(s):/usr/bin/make cmTC_5ea2e/fast && /usr/bin/make -f CMakeFiles/cmTC_5ea2e.dir/build.make CMakeFiles/cmTC_5ea2e.dir/build
    make[1]: Entering directory '/root/tas-ros2-system/build/pp_infer/CMakeFiles/CMakeTmp'
    Building CXX object CMakeFiles/cmTC_5ea2e.dir/testCXXCompiler.cxx.o
    /usr/bin/gcc-8     -o CMakeFiles/cmTC_5ea2e.dir/testCXXCompiler.cxx.o -c /root/tas-ros2-system/build/pp_infer/CMakeFiles/CMakeTmp/testCXXCompiler.cxx
    gcc-8: error trying to exec 'cc1plus': execvp: No such file or directory
    make[1]: *** [CMakeFiles/cmTC_5ea2e.dir/build.make:66: CMakeFiles/cmTC_5ea2e.dir/testCXXCompiler.cxx.o] Error 1
    make[1]: Leaving directory '/root/tas-ros2-system/build/pp_infer/CMakeFiles/CMakeTmp'
    make: *** [Makefile:121: cmTC_5ea2e/fast] Error 2

  CMake will not be able to correctly generate this project.
Call Stack (most recent call first):
  CMakeLists.txt:17 (project)

---
Failed   <<< pp_infer [0.52s, exited with code 1]

So, despite gcc-8 being installed, it needs g++8 also.

apt-get install g++-8

Then, testing compilation is now successfull with nvcc

$ cat simple.cu 
#include <cuda_runtime.h>
int main() { return 0; } 
nvcc -ccbin=/usr/bin/g++-8 simple.cu

Finally, this got past the part to find CUDA successfully.

colcon build --packages-up-to pp_infer --cmake-args -DCMAKE_CXX_COMPILER=g++-8

Any recommendations on how to set this up would be greatly appreciated, as my proposal wasn't as smooth sailing as I hoped.