DavidDiazGuerra / gpuRIR

Python library for Room Impulse Response (RIR) simulation with GPU acceleration
GNU Affero General Public License v3.0
481 stars 94 forks source link

Undefined symbol: cufftExecC2R #66

Open adku1173 opened 1 week ago

adku1173 commented 1 week ago

With the updated binding, package installation works but importing the gpuRIR package raises the following error across various Python versions (3.8-3.12) on my Ubuntu 20.04 system.

ImportError: /home/kujawski/.conda/envs/dev2/lib/python3.8/site-packages/gpuRIR_bind.cpython-38-x86_64-linux-gnu.so: undefined symbol: cufftExecC2R

What I tried (without success):

To me, the issue must be related to the updated binding (#65 ), since I had successfully run the examples using python 3.8, 3.9 and 3.10 before the update.

Are there any considerations regarding the use of a different package for the bindings, e.g. such as CuPy?

The issue is related to #50, #54, #42

DavidDiazGuerra commented 1 week ago

The undefined cufftExecC2R problem has been around for quite a long and I have never been able even to understand where it is coming from since I was never able to reproduce it. I even asked in the NVidia support forums without any success: https://forums.developer.nvidia.com/t/undefined-symbol-cufftexecc2r-after-installing-cmake-python-library/284151/2

I installed the library with the new pybind11 version in a clean environment before merging #65 and I was able to run examples/example.py without errors. Considering this and that people have also reported this problem with the previous version, I'm not sure if the problem is really related to the update.

Unfortunately, I don't have time now to try to use a different package for the bindings.

adku1173 commented 1 week ago

Today I managed to find the cause for this problem. I have to admit that the problem is related to cmake (and my cuda installation) and not the pybind11 package.

I did some modifications of the CMakeLists.txt file:

project(gpuRIR LANGUAGES CXX CUDA)
find_package(CUDAToolkit REQUIRED)

# status messages for debugging
message(STATUS "Found CUDA: ${CUDAToolkit_VERSION}")
message(STATUS "CUDAToolkit_BIN_DIR: ${CUDAToolkit_BIN_DIR}")
message(STATUS "CUDAToolkit_NVCC_EXECUTABLE: ${CUDAToolkit_NVCC_EXECUTABLE}")
message(STATUS "CMAKE_CUDA_COMPILER: ${CMAKE_CUDA_COMPILER}")
...
...
target_link_libraries(gpuRIR_bind PRIVATE CUDA::curand CUDA::cufft)
target_link_libraries(gpuRIR_bind PRIVATE gpuRIRcu pybind11::module)
  1. I replaced the deprecated find_package(CUDA) by find_package(CUDAToolkit REQUIRED)

  2. Then I added some lines to investigate which cuda / nvcc is found by cmake. Turns out, on my system, the path is:

      -- Found CUDA: 10.1.243
      -- CUDAToolkit_BIN_DIR: /usr/bin
      -- CUDAToolkit_NVCC_EXECUTABLE: /usr/bin/nvcc
      -- CMAKE_CUDA_COMPILER: /usr/bin/nvcc

    This is very strange, it finds an old CUDA installation (v10.1.243) and /usr/bin does neither include nvcc nor the cuda toolkit directory (and no simlink). And this is probably the reason why the operation cufftExecC2Rof the cuFFT library is undefined. I don't really understand where cmake is searching for the cuda compiler, but it is obviously the wrong place.

  3. I set the CUDACXX environment variable to the correct nvcc location and installed again with export CUDACXX=/usr/local/cuda/bin/nvcc && pip install . (make sure to delete the old build artifacts before: rm -r gpuRIR.egg-info/ && rm -r build/). This time the output is

      -- Found CUDA: 12.3.52
      -- CUDAToolkit_BIN_DIR: /usr/local/cuda/bin
      -- CUDAToolkit_NVCC_EXECUTABLE: /usr/local/cuda/bin/nvcc
      -- CMAKE_CUDA_COMPILER: /usr/local/cuda/bin/nvcc

    This time I could load the correct toolkit and no problem appeared when running the example files.

After this I decided to fully uninstall cuda toolkit from my system and to install the cuda-toolkit into a fresh conda environment with: conda install nvidia/label/cuda-12.6.1::cuda-toolkit. This is what finally solved it for me. This time it was not necessary to set the CUDACXX. Console output was:

      -- Found CUDA: 12.6.68
      -- CUDAToolkit_BIN_DIR: /home/kujawski/.conda/envs/dev2/targets/x86_64-linux/bin
      -- CUDAToolkit_NVCC_EXECUTABLE: /home/kujawski/.conda/envs/dev2/targets/x86_64-linux/bin/nvcc
      -- CMAKE_CUDA_COMPILER: /home/kujawski/.conda/envs/dev2/bin/nvcc

@DavidDiazGuerra If you like, I can make a PR for the updated CMakeLists.txt file. In addition, I believe it would be a good idea to add a Known Issues section to the README file.

DavidDiazGuerra commented 1 week ago

It's really good you found this out. I have no idea why CMake should be finding an old CUDA installation without the CUDA Toolkit, but the CUDA integration with CMake has been bringing problems from the very beginning of the library (I guess that me having zero previous experience with CMake didn't help either). Feel free to send a PR updating the CMakeLists and adding some notes in the README, I will be happy to accept it. Thanks a lot.