ACDSLab / MPPI-Generic

Templated C++/CUDA implementation of Model Predictive Path Integral Control (MPPI)
https://acdslab.github.io/mppi-generic-website/
BSD 2-Clause "Simplified" License
84 stars 7 forks source link

Error during build #2

Closed Jaeyoung-Lim closed 1 month ago

Jaeyoung-Lim commented 1 month ago

Description I tried folloing the installation instructions: https://acdslab.github.io/mppi-generic-website/setup However, when trying to build the examples it returned the following error.

~/dev/MPPI-Generic/build$ make
[ 10%] Built target cnpy
[ 20%] Built target example1
[ 30%] Built target cnpy-static
Scanning dependencies of target cartpole_mppi
[ 35%] Building CUDA object src/controllers/cartpole/CMakeFiles/cartpole_mppi.dir/cartpole_mppi.cu.o
nvcc fatal   : Unknown option '-display-error-number'
make[2]: *** [src/controllers/cartpole/CMakeFiles/cartpole_mppi.dir/build.make:63: src/controllers/cartpole/CMakeFiles/cartpole_mppi.dir/cartpole_mppi.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:352: src/controllers/cartpole/CMakeFiles/cartpole_mppi.dir/all] Error 2
make: *** [Makefile:130: all] Error 2

System information

Is there anything I am overlooking on running an example?

bogidude commented 1 month ago

How are you getting your reported nvcc/cuda versions you reported? The nvcc version should match your cuda version.

nvcc version check should be nvcc --version

CUDA version check is done the first time you run cmake in a build folder and would look something like this image

My guess is that you are probably using CUDA 10.1 since that is what nvcc is reporting and the -display-error-number flag might not have existed back then. I will investigate and try to fix our compatibility with older CUDA versions. This compiler flag does exist in CUDA 11+ so if you installed CUDA 12.2, CMake somehow found a nvcc executable from CUDA 10 first.

Jaeyoung-Lim commented 1 month ago

@bogidude Thanks for the response, I think you are right. nvcc version:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243

log output from cmake:

$ cmake -DCMAKE_INSTALL_PREFIX=~/.local ..
-- The C compiler identification is GNU 9.4.0
-- The CXX compiler identification is GNU 9.4.0
-- The CUDA compiler identification is NVIDIA 10.1.243
-- Check for working C compiler: /usr/bin/cc
-- Check for working C compiler: /usr/bin/cc -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working CXX compiler: /usr/bin/c++
-- Check for working CXX compiler: /usr/bin/c++ -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working CUDA compiler: /usr/bin/nvcc
-- Check for working CUDA compiler: /usr/bin/nvcc -- works
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Setting Build Type to Release by default
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE  
-- Found CUDA: /usr (found version "10.1") 
-- Autodetected CUDA architecture(s):  6.1
-- Additional Architectures: -gencode=arch=compute_61,code=sm_61
-- Found ZLIB: /usr/lib/x86_64-linux-gnu/libz.so (found version "1.2.11") 
-- Configuring done
-- Generating done
-- Build files have been written to: /home/jaeyoung/dev/MPPI-Generic/build
$ nvidia-smi
Tue Sep 17 17:39:14 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Quadro P2200                   Off | 00000000:01:00.0 Off |                  N/A |
| 44%   30C    P8               4W /  75W |     11MiB /  5120MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      1452      G   /usr/lib/xorg/Xorg                            4MiB |
|    0   N/A  N/A      2178      G   /usr/lib/xorg/Xorg                            4MiB |
+---------------------------------------------------------------------------------------+

I think the documentation still mentions that the library should be compatible for CUDA 10 here: https://acdslab.github.io/mppi-generic-website/setup . I can build the main library, but not the example.

I guess in my specific case, I also have CUDA 12.2, but CMake just defaults to CUDA 10?

bogidude commented 1 month ago

We intend to support CUDA 10 so this is a bug on our end I am currently writing the fix for. Thanks for the catch. I should have the fix up in the next 30 minutes.

As for the output from nvidia-smi, the CUDA version reported there is the CUDA version that was made available with that CUDA driver, not the CUDA version on your local system. It is something that has tripped me up in the past and is very annoying about the nvidia-smi output.

bogidude commented 1 month ago

@Jaeyoung-Lim I uploaded a branch bug/compilation_flags_on_older_cuda. Can you please check that it works for you? I currently don't have CUDA 10 on any of my systems to verify myself.

Jaeyoung-Lim commented 1 month ago

@bogidude Thanks for the quick fix! I think that solved the problem with the non-existing flag, but seems to result in a compilation error

$ make
[ 10%] Built target cnpy
[ 20%] Built target example1
[ 30%] Built target cnpy-static
Scanning dependencies of target cartpole_mppi
[ 35%] Building CUDA object src/controllers/cartpole/CMakeFiles/cartpole_mppi.dir/cartpole_mppi.cu.o
/home/jaeyoung/dev/MPPI-Generic/include/mppi/dynamics/cartpole/cartpole_dynamics.cu: In member function ‘void CartpoleDynamics::computeDynamics(const Eigen::Ref<const Eigen::Matrix<float, 4, 1, 0, 4, 1>, 0, Eigen::InnerStride<1> >&, const Eigen::Ref<const Eigen::Matrix<float, 1, 1, 0, 1, 1>, 0, Eigen::InnerStride<1> >&, Eigen::Ref<Eigen::Matrix<float, 4, 1, 0, 4, 1>, 0, Eigen::InnerStride<1> >)’:
/home/jaeyoung/dev/MPPI-Generic/include/mppi/dynamics/cartpole/cartpole_dynamics.cu:62:94: error: ‘VEL_X’ is not a member of ‘CartpoleDynamicsParams’
   state_der(0) = state(S_INDEX(VEL_X));
                                                                                              ^    
/home/jaeyoung/dev/MPPI-Generic/include/mppi/dynamics/cartpole/cartpole_dynamics.cu: In member function ‘virtual MPPI_internal::Dynamics<CartpoleDynamics, CartpoleDynamicsParams>::state_array CartpoleDynamics::stateFromMap(const std::map<std::__cxx11::basic_string<char>, float, std::less<std::__cxx11::basic_string<char> >, std::allocator<std::pair<const std::__cxx11::basic_string<char>, float> > >&)’:
/home/jaeyoung/dev/MPPI-Generic/include/mppi/dynamics/cartpole/cartpole_dynamics.cu:113:75: error: ‘POS_X’ is not a member of ‘CartpoleDynamicsParams’
   s(S_INDEX(POS_X)) = map.at("POS_X");
                                                                           ^    
/home/jaeyoung/dev/MPPI-Generic/include/mppi/dynamics/cartpole/cartpole_dynamics.cu:114:75: error: ‘VEL_X’ is not a member of ‘CartpoleDynamicsParams’
   s(S_INDEX(VEL_X)) = map.at("VEL_X");
                                                                           ^    
/home/jaeyoung/dev/MPPI-Generic/include/mppi/dynamics/cartpole/cartpole_dynamics.cu:115:75: error: ‘THETA’ is not a member of ‘CartpoleDynamicsParams’
   s(S_INDEX(THETA)) = map.at("THETA");
                                                                           ^    
/home/jaeyoung/dev/MPPI-Generic/include/mppi/dynamics/cartpole/cartpole_dynamics.cu:116:75: error: ‘THETA_DOT’ is not a member of ‘CartpoleDynamicsParams’
   s(S_INDEX(THETA_DOT)) = map.at("THETA_DOT");
                                                                           ^        
make[2]: *** [src/controllers/cartpole/CMakeFiles/cartpole_mppi.dir/build.make:63: src/controllers/cartpole/CMakeFiles/cartpole_mppi.dir/cartpole_mppi.cu.o] Error 1
make[1]: *** [CMakeFiles/Makefile2:352: src/controllers/cartpole/CMakeFiles/cartpole_mppi.dir/all] Error 2
make: *** [Makefile:130: all] Error 2
bogidude commented 1 month ago

That is an interesting compilation issue we don't see on newer CUDA versions. I will probably need a bit more time to debug that issue. I will try to setup CUDA 10 on my system again so I can debug it more directly and fix any further issues with CUDA 10. I will let you know when I have a fix for you.

Jaeyoung-Lim commented 1 month ago

@bogidude Thanks!

bogidude commented 1 month ago

@Jaeyoung-Lim I pushed a new commit to that same branch that should fix all of the CUDA 10 compilation issues. Can you try it on your system and verify you can compile and run the examples?

Jaeyoung-Lim commented 1 month ago

@bogidude Thanks for the quick fix!

That fixed the problem. Could you maybe explain what you had to adjust to make it compatible with Cuda 10.1?

bogidude commented 1 month ago

We have a params structure specific to each dynamics class. For example, the cartpole one is called CartpoleDynamicsParams. The params structures define enums for the states and controls so that you can refer to the VEL_X state instead of having to remember the order of values. To facilitate using these enums, we created various macros like S_INDEX and C_INDEX. So the expected use of these enums would be state[S_INDEX(VEL_X)] versus state[0] inside the dynamics. The way we had defined these macros ended up becoming decltype(params)::StateIndex::VEL_X which would not compile in CUDA 10 for some reason. In our base Dynamics class, we have a type alias for the params DYN_PARAMS_T which we could use in the macro instead. Because of C++ template inheritance intricacies, you can't use DYN_PARAMS_T directly in a class that inherits from the Dynamics template. Instead, you need to specify the parent dynamics class and then use PARENT_CLASS::DYN_PARAMS_T. I changed the macros to use this formulation. This means that any future dynamics classes will need to define PARENT_CLASS to be able to use the macros, but we have already been doing this for nearly all of our dynamics already.

bogidude commented 1 month ago

Fixed in #3 and has been merged into main. Closing for now though feel free to ask questions if you have any more