nvcc wrapper script conflicts with CMake's FindCUDAToolkit

mbargull commented 3 years ago

ref: https://github.com/conda-forge/faiss-split-feedstock/pull/17/commits/19724a315b2524ab796ca961625869c37d765f60

+     # Acc. to https://cmake.org/cmake/help/v3.19/module/FindCUDAToolkit.html#search-behavior
+     # CUDA toolkit is search relative to `nvcc` first before considering
+     # "-DCUDAToolkit_ROOT=${CUDA_HOME}". We have multiple workarounds:
+     #   - Add symlinks from ${CUDA_HOME} to ${BUILD_PREFIX}
+     #   - Add ${CUDA_HOME}/bin to ${PATH}
+     #   - Remove `nvcc` wrapper in ${BUILD_PREFIX} so that `nvcc` from ${CUDA_HOME} gets found.
+     # TODO: Fix this in nvcc-feedstock or cmake-feedstock.
+     # NOTE: It's okay for us to not use the wrapper since CMake adds -ccbin itself.
+     rm "${BUILD_PREFIX}/bin/nvcc"

This may or may not be rather an issue for https://github.com/conda-forge/cmake-feedstock , but I'm opening it here since it's very specific to nvcc. One approach would be to patch CMake (in cmake-feedstock or, ideally, upstream) to consider CUDAToolkit_ROOT before nvcc-relative paths. However, I'll leave those decisions up to people with more experience around CMake and CUDA (because I use neither personally).

cc @h-vetinari

kkraus14 commented 3 years ago

We've run into similar problems with using a ccache symlink for nvcc that moves it outside of the normal installation location. cc @trxcllnt

@robertmaynard any ideas on the best path forward here? Seems like this is going to repeatedly come up. For context, this is the same issue of someone effectively symlinking nvcc to a different location than in the CUDA installation location and the FindCUDAToolkit searching relative to the found nvcc before looking in the passed CUDAToolkit_ROOT.

robertmaynard commented 3 years ago

For these uses cases the CUDA language is being enabled correct?

I think this is caused by a fast path that exists in FindCUDAToolkit when CUDA is enabled, and nvcc is the compiler. It computes a guessed root directory, and stores it in CMAKE_CUDA_COMPILER_TOOLKIT_ROOT but never verifies that directory. I think we can safely extend FindCUDAToolkit to do this verification for the sentinel file ( version.txt ) to correct this behavior.

Am I remembering correctly that we can't peak at where the nvcc symlink destination is? Since at times /usr/local/cuda/bin/nvcc is a symlink as well and that would break

mbargull commented 3 years ago

For these uses cases the CUDA language is being enabled correct?

In the one I encountered, yes: https://github.com/facebookresearch/faiss/blob/v1.6.4/CMakeLists.txt#L26

Am I remembering correctly that we can't peak at where the nvcc symlink destination is? Since at times /usr/local/cuda/bin/nvcc is a symlink as well and that would break

Symlink chains would be one problematic case. Another, the one we encountered here, are wrapper scripts, i.e. we have a nvcc on PATH that is actually just a script executing the actual nvcc from its location.

robertmaynard commented 3 years ago

I don't know if I will have time to tackle this issue before the new year, but I am not ignoring it :)

kkraus14 commented 3 years ago

I don't know if I will have time to tackle this issue before the new year, but I am not ignoring it :)

Just wanted to give you some visibility as I know we had previously chatted about it. No worries! Thanks for all of your work on CMake 😄

robertmaynard commented 3 years ago

nvcc wrapper script support was merged into CMake, and should be part of 3.20.

jakirkham commented 3 years ago

cmake version 3.20 has been packaged and is in use conda-forge. Closing this out...

If anything else crops up, please feel free to raise a new issue

conda-forge / nvcc-feedstock

nvcc wrapper script conflicts with CMake's FindCUDAToolkit #56