NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.62k stars 13.77k forks source link

RuntimeError: PyTorch is not linked with support for X devices #192596

Open davidak opened 2 years ago

davidak commented 2 years ago

Describe the bug

I'm testing https://github.com/NixOS/nixpkgs/pull/192391 and would like to use my AMD GPU. Since i don't have HIP setup yet (CUDA translation), i tried OpenCL, Vulkan and OpenGL. All fail with this error.

[davidak@gaming:~]$ whisper --device opencl --model medium --language de test.mp4
Traceback (most recent call last):
  File "/nix/store/c1fkpcpz17mccwk9q586a6y2smlwzsbd-python3.10-whisper-unstable-2022-09-21/bin/.whisper-wrapped", line 9, in <module>
    sys.exit(cli())
  File "/nix/store/c1fkpcpz17mccwk9q586a6y2smlwzsbd-python3.10-whisper-unstable-2022-09-21/lib/python3.10/site-packages/whisper/transcribe.py", line 276, in cli
    model = load_model(model_name).to(device)
  File "/nix/store/0rz5lf0gzlc7a5zsigbk354bdwmnaxl8-python3.10-torch-1.12.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 927, in to
    return self._apply(convert)
  File "/nix/store/0rz5lf0gzlc7a5zsigbk354bdwmnaxl8-python3.10-torch-1.12.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  File "/nix/store/0rz5lf0gzlc7a5zsigbk354bdwmnaxl8-python3.10-torch-1.12.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 579, in _apply
    module._apply(fn)
  File "/nix/store/0rz5lf0gzlc7a5zsigbk354bdwmnaxl8-python3.10-torch-1.12.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 602, in _apply
    param_applied = fn(param)
  File "/nix/store/0rz5lf0gzlc7a5zsigbk354bdwmnaxl8-python3.10-torch-1.12.1/lib/python3.10/site-packages/torch/nn/modules/module.py", line 925, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
RuntimeError: PyTorch is not linked with support for opencl devices

Steps To Reproduce

Steps to reproduce the behavior:

  1. nix run nixpkgs#nixpkgs-review pr 192391
  2. open another shell and run the result there (it fails in nixpkgs-review shell)
[nix-shell:~/.cache/nixpkgs-review/pr-192391]$ ll results/
total 4
lrwxrwxrwx 1 davidak users 82 Sep 23 10:15 openai-whisper -> /nix/store/c1fkpcpz17mccwk9q586a6y2smlwzsbd-python3.10-whisper-unstable-2022-09-21
  1. /nix/store/c1fkpcpz17mccwk9q586a6y2smlwzsbd-python3.10-whisper-unstable-2022-09-21/bin/whisper --device opencl --model medium --language de test.wav

Expected behavior

Options for --device appear to include: cpu, cuda, ipu, xpu, mkldnn, opengl, opencl, ideep, hip, ve, ort, mps, xla, lazy, vulkan, meta, hpu, privateuseone

Source: https://github.com/openai/whisper/discussions/55

I don't know what most of these are, but if most PyTorch distributions support them, ours should too.

cpu works, but is slow. cuda might work on nvidia, when cudaSupport is enabled for PyTorch.

Additional context

Add any other context about the problem here.

Notify maintainers

@mweinelt @teh @thoughtpolice

Metadata

mweinelt commented 2 years ago

Tried the following patch

diff --git a/pkgs/development/python-modules/torch/default.nix b/pkgs/development/python-modules/torch/default.nix
index 672fcf75d33..90506a4af0a 100644
--- a/pkgs/development/python-modules/torch/default.nix
+++ b/pkgs/development/python-modules/torch/default.nix
@@ -1,5 +1,6 @@
 { stdenv, lib, fetchFromGitHub, fetchpatch, buildPythonPackage, python,
   cudaSupport ? false, cudaPackages, magma,
+  openclSupport ? true, opencl-clhpp,
   mklDnnSupport ? true, useSystemNccl ? true,
   MPISupport ? false, mpi,
   buildDocs ? false,
@@ -172,6 +173,8 @@ in buildPythonPackage rec {
   USE_MKLDNN = setBool mklDnnSupport;
   USE_MKLDNN_CBLAS = setBool mklDnnSupport;

+  USE_OPENCL = setBool openclSupport;
+
   # Avoid using pybind11 from git submodule
   # Also avoids pytorch exporting the headers of pybind11
   USE_SYSTEM_BIND11 = true;
@@ -225,6 +228,7 @@ in buildPythonPackage rec {

   buildInputs = [ blas blas.provider pybind11 ]
     ++ lib.optionals cudaSupport [ cudnn magma nccl ]
+    ++ lib.optionals openclSupport [ opencl-clhpp ]
     ++ lib.optionals stdenv.isLinux [ numactl ]
     ++ lib.optionals stdenv.isDarwin [ CoreServices libobjc ];

python3.10-torch> INFOUSING OPENCL
python3.10-torch> -- Looking for CL_VERSION_3_0
python3.10-torch> -- Looking for CL_VERSION_3_0 - found
python3.10-torch> CMake Error at /nix/store/zvsq0y9ps31d79pfrhxa1bcsl9w07ahd-cmake-3.24.1/share/cmake-3.24/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
python3.10-torch>   Could NOT find OpenCL (missing: OpenCL_LIBRARY) (found version "3.0")
python3.10-torch> Call Stack (most recent call first):
python3.10-torch>   /nix/store/zvsq0y9ps31d79pfrhxa1bcsl9w07ahd-cmake-3.24.1/share/cmake-3.24/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
python3.10-torch>   /nix/store/zvsq0y9ps31d79pfrhxa1bcsl9w07ahd-cmake-3.24.1/share/cmake-3.24/Modules/FindOpenCL.cmake:163 (find_package_handle_standard_args)
python3.10-torch>   cmake/Dependencies.cmake:853 (find_package)
python3.10-torch>   CMakeLists.txt:696 (include)
python3.10-torch> 
python3.10-torch> 
python3.10-torch> -- Configuring incomplete, errors occurred!
python3.10-torch> See also "/build/source/build/CMakeFiles/CMakeOutput.log".
python3.10-torch> See also "/build/source/build/CMakeFiles/CMakeError.log".
error: builder for '/nix/store/9z0xjzy5id3bs8camv2a4djb2d61gsxn-python3.10-torch-1.12.1.drv' failed with exit code 1;
Madouura commented 1 year ago

Planning to implement ROCm support for pytorch. Related:

Madouura commented 1 year ago

We added ROCm support to torch in e539c148f39d7450c6ac701f0ec3404568764859. I'm not sure how OpenCL support is going to go, but if you use an AMD GPU you should be set for now.