Clarification regarding compute capability compatibility

pauldmccarthy commented 1 year ago

Comment:

Howdy,

(Apologies if, as I suspect, this question is better suited over at conda/conda and/or mamba-org/mamba)

As it stands, it appears to be possible to install a version of cuDNN which is compatible with the installed CUDA version, but which is incompatible with the compute capability of the available hardware. For example, if I run the following on a system with a Tesla K80 (compute capability 3.7):

mamba create -c conda-forge -p ./test.env cudnn

I end up with cuDNN=8.8.0 (latest available on conda-forge at the time of writing), which requires hardware which supports, at minimum, compute capability 5.0:

``` mamba create -c conda-forge -p ./test.env cudnn __ __ __ __ / \ / \ / \ / \ / \/ \/ \/ \ ███████████████/ /██/ /██/ /██/ /████████████████████████ / / \ / \ / \ / \ \____ / / \_/ \_/ \_/ \ o \__, / _/ \_____/ ` |/ ███╗ ███╗ █████╗ ███╗ ███╗██████╗ █████╗ ████╗ ████║██╔══██╗████╗ ████║██╔══██╗██╔══██╗ ██╔████╔██║███████║██╔████╔██║██████╔╝███████║ ██║╚██╔╝██║██╔══██║██║╚██╔╝██║██╔══██╗██╔══██║ ██║ ╚═╝ ██║██║ ██║██║ ╚═╝ ██║██████╔╝██║ ██║ ╚═╝ ╚═╝╚═╝ ╚═╝╚═╝ ╚═╝╚═════╝ ╚═╝ ╚═╝ mamba (1.4.1) supported by @QuantStack GitHub: https://github.com/mamba-org/mamba Twitter: https://twitter.com/QuantStack █████████████████████████████████████████████████████████████ Looking for: ['cudatoolkit', 'cudnn'] conda-forge/linux-64 Using cache conda-forge/noarch Using cache Transaction Prefix: /home/paulmc/test.env Updating specs: - cudatoolkit - cudnn Package Version Build Channel Size ──────────────────────────────────────────────────────────────────────────── Install: ──────────────────────────────────────────────────────────────────────────── + _libgcc_mutex 0.1 conda_forge conda-forge/linux-64 Cached + _openmp_mutex 4.5 2_gnu conda-forge/linux-64 Cached + cuda-version 11.8 h70ddcb2_2 conda-forge/noarch 21kB + cudatoolkit 11.8.0 h37601d7_11 conda-forge/linux-64 667MB + cudnn 8.8.0.121 h0800d71_1 conda-forge/linux-64 479MB + libgcc-ng 12.2.0 h65d4601_19 conda-forge/linux-64 Cached + libgomp 12.2.0 h65d4601_19 conda-forge/linux-64 Cached + libstdcxx-ng 12.2.0 h46fd767_19 conda-forge/linux-64 Cached + libzlib 1.2.13 h166bdaf_4 conda-forge/linux-64 Cached Summary: Install: 9 packages Total download: 1GB ──────────────────────────────────────────────────────────────────────────── Confirm changes: [Y/n] ```

Subsequently, when I try to run some code utilising cuDNN with this environment, I encounter CUDNN_STATUS_ARCH_MISMATCH errors.

So my question is: is it the responsibility of the user to choose a suitable version of cuDNN which is compatible with their hardware?

Thanks!

*As an aside, my installed GPU driver supports CUDA 11.4, whereas conda/mamba both install cuda-version / cudatoolkit 11.8. I initially thought that this might be due to a previously reported bug, but then remembered that NVIDIA have started guaranteeing limited forward-compatibility within major CUDA releases from 11 onwards, so this behaviour appears to be valid.

scdub commented 1 year ago

I can't answer on behalf of conda-forge, but I think this being a user responsibility makes sense. Each CUDA release has a range of architectures it supports, and in turn cuDNN targets a range of CUDA versions. cuDNN 8.8.0 (support matrix) was the first release to add CUDA 12, and dropped both CUDA 10.2 and CUDA 11.0-11.6, and support for Kepler hardware.

Independently from the CUDA version, you can still run into issues of compatibility, as the package developers who compile CUDA kernels also need to specify what platforms they support through the cubins that are embedded in their packages. These cubins vary between packages, and don't always match 1:1 with what the underlying CUDA version supports. At a minimum using cuda-version / cudatoolkit that matches what your hardware supports gives you the best chance to identify compatible builds. Kepler is getting old enough that its support lifecycle is coming to and end for many packages as they move to support newer CUDA releases.

vyasr commented 1 year ago

Apologies for the delayed response. You are correct, there is at present no way for conda to handle this and it is the user's responsibility to choose a version that is suitable for the architecture. The available CUDA driver may be checked using the virtual __cuda package, but not the hardware arch.

pauldmccarthy commented 1 year ago

@vyasr @scdub thanks for your replies! At the moment, it's really just cuDNN 8.8.0 that is the issue, as it requires compute >= 5.0, whereas CUDA 11.* still supports compute >= 3.7. But this is something that I can easily handle when setting up my environments. Thanks!

conda-forge / cudnn-feedstock

Clarification regarding compute capability compatibility #60

Comment: