conda-forge / nvcc-feedstock

A conda-smithy repository for nvcc.
BSD 3-Clause "New" or "Revised" License
12 stars 23 forks source link

Search `/usr/include` for CUDA 10.1 headers #26

Closed jakirkham closed 4 years ago

jakirkham commented 5 years ago

As CUDA 10.1 moves some headers and libraries from /usr/local/cuda to /usr, we need to adjust our search strategy for finding them. Since cudatoolkit already contains the libraries and is available during the build, this is less of an issue. However we still need to point the compiler to the headers in /usr/include. Thus we add /usr/include to our compiler flags. To avoid clobbering other search paths we care about, we make sure to add /usr/include last. As we are not searching /usr/lib64, this shouldn't result in any accidental linkages against system libraries.

ref: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html

Checklist

conda-forge-linter commented 5 years ago

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

jakirkham commented 5 years ago

@conda-forge-admin, please re-render.

conda-forge-linter commented 5 years ago

Hi! This is the friendly automated conda-forge-webservice.

I tried to re-render for you, but it looks like there was nothing to do.

jakirkham commented 5 years ago

For context, here's one case (cuBLAS) where libraries and headers are moved out of ${CUDA_HOME}.

With this release, on Linux systems, the cuBLAS libraries listed below are now installed in the /usr/lib/<arch>-linux-gnu/ or /usr/lib64/ directories as shared and static libraries. Their interfaces are available in the /usr/include directory

ref: https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html#cublas-new-features

pearu commented 5 years ago

The idea of this PR sounds (really) bad as it (i) contradicts with the idea of keeping system and conda environments separate (ii) will certainly cause issues for those who wish to switch between different CUDA versions according to this nvcc recipe.

Perhaps this issue should be raised upstream as with 10.1 policy, one cannot use two CUDA 10.1+ installations in one system anymore (when installing CUDA libraries to /usr/{lib,include}). For instance, the target location such as /usr/lib/x86_64-linux-gnu/libcublas.so will be coinciding for the two installations.

On the other hand, the only way to install two or more CUDA 10.1+ toolkits to one system without conflicts is to use cuda toolkit installer --installpath option:

$ bash cuda_10.1.243_418.87.00_linux.run --help
...
  --librarypath=<path>
    Install libraries to the <path> directory. If this flag is not provided,
    the default path of your distribution is used. This flag only applies to
    libraries installed outside of the CUDA Toolkit path.

  --installpath=<path>
    Install everything to the <path> directory. This flag sets the same values
    as the toolkitpath, samplespath, and librarypath options.
...

There are two kinds of nvcc recipe users:

  1. the ones who always use the latest CUDA toolkit and this PR would resolve the cuBlas issue (or equivalently, (ii) would not be an issue) but the issue (i) will remain.
  2. the ones who use different versions of CUDA toolkit via specifying --installpath and this PR would not be needed but the issue (i) remains.

In conclusion, I see three options:

  1. reject this PR as it is and recommend using --installpath for all nvcc recipe users.
  2. adjust the PR so that using /usr would not be enabled by default.
  3. adjust the PR so that using /usr could be disabled.
jakirkham commented 5 years ago

Yep, I hear what you are saying and have similar concerns. FWIW this issue is also being raised at NVIDIA.

That said, this does not change the fact that we have this issue today and we need to address it somehow. So the question now becomes what do we do to handle this issue.

On the other hand, the only way to install two or more CUDA 10.1+ toolkits to one system without conflicts is to use cuda toolkit installer --installpath option

Have you tried this out? If so, are you seeing cuBLAS, etc. showing up in the intended install location or are they still winding up in /usr?

pearu commented 5 years ago

Yes, in my box I have installed 9.2, 10.0, and 10.1 versions to /usr/local/cuda-X.Y.Z/ using --installpath and cuBLAS libraries and include files are under the installpath directory. Here's the cudatoolkit installation commands that I have used for 10.1, for instance:

sudo bash cuda_10.1.243_418.87.00_linux.run \
  --toolkit --toolkitpath=/usr/local/cuda-10.1.243/ \
  --installpath=/usr/local/cuda-10.1.243/ --override \
  --no-opengl-libs --no-man-page --no-drm --silent

Here's the result:

$ ls /usr/local/cuda-10.1.243/lib64/libcublas*
/usr/local/cuda-10.1.243/lib64/libcublas.so
/usr/local/cuda-10.1.243/lib64/libcublas.so.10
/usr/local/cuda-10.1.243/lib64/libcublas.so.10.2.1.243
/usr/local/cuda-10.1.243/lib64/libcublasLt.so
/usr/local/cuda-10.1.243/lib64/libcublasLt.so.10
/usr/local/cuda-10.1.243/lib64/libcublasLt.so.10.2.1.243
/usr/local/cuda-10.1.243/lib64/libcublasLt_static.a
/usr/local/cuda-10.1.243/lib64/libcublas_static.a
jakirkham commented 4 years ago

This is now handled in the Docker image ( https://github.com/conda-forge/docker-images/pull/134 ). Thanks @bdice! 😄