bytedeco / javacpp-presets

The missing Java distribution of native C++ libraries
Other
2.65k stars 736 forks source link

CUDA support in PyTorch broken after update to 1.5.9 stable #1376

Closed sbrunk closed 1 year ago

sbrunk commented 1 year ago

I'm using cuda-platform-redist to avoid being dependent on as system-installed cuda, which was working great using the 1.5.9-SNAPHSOT versions and cuda 11.8-8.6. Now after upgrading to 1.5.9 stable in conjuction with the cuda update to 12.1-8.9, something seems to be missing:

[W interface.cpp:47] Warning: Loading nvfuser library failed with: Error in dlopen: libcusolver.so.11: cannot open shared object file: No such file or directory (function LoadingNvfuserLibrary)
...
pytorch-2.0.1-1.5.9-linux-x86_64-gpu.jar/org/bytedeco/pytorch/linux-x86_64-gpu/libjnitorch.so: libcusolver.so.11: cannot open shared object file: No such file or directory

I've tried going back to cuda 11.8-8.6-1.5.8 (only the native libs via classifier to avoid javacpp-1.5.8 being pulled in), but I got other linking errors due to missing cuda 12 libs, which suggests libtorch is now built against cuda 12:

pytorch-2.0.1-1.5.9-linux-x86_64-gpu.jar/org/bytedeco/pytorch/linux-x86_64-gpu/libjnitorch.so: libcudart.so.12: cannot open shared object file: No such file or directory

Any ideas?

sbrunk commented 1 year ago

Ok this is interesting libcusolver.so.11 is part of the cuda jar, but it isn't copied into the javacpp cache dir. It seems to work when I copy the lib manually into the cache into the pytorch native libs.

Still waiting for the cuda kernels to be compiled, but before it errored out immediately...

Any idea why it is not extracted like the other files?

saudet commented 1 year ago

Looks like I forgot to add a line for cusolver@.11 in the presets. @HGuillemet Please apply something like this in your pull request.

--- a/pytorch/src/main/java/org/bytedeco/pytorch/presets/torch.java
+++ b/pytorch/src/main/java/org/bytedeco/pytorch/presets/torch.java
@@ -1816,6 +1816,7 @@ public class torch implements LoadEnabled, InfoMapper {
                      : lib.equals("nvinfer") ? "@.8"
                      : lib.equals("cufft") ? "@.11"
                      : lib.equals("curand") ? "@.10"
+                     : lib.equals("cusolver") ? "@.11"
                      : lib.equals("cudart") ? "@.12"
                      : lib.equals("nvrtc") ? "@.12"
                      : lib.equals("nvJitLink") ? "@.12"
@@ -1827,6 +1828,7 @@ public class torch implements LoadEnabled, InfoMapper {
                      : lib.equals("nvinfer") ? "64_8"
                      : lib.equals("cufft") ? "64_11"
                      : lib.equals("curand") ? "64_10"
+                     : lib.equals("cusolver") ? "64_11"
                      : lib.equals("cudart") ? "64_12"
                      : lib.equals("nvrtc") ? "64_120_0"
                      : lib.equals("nvJitLink") ? "64_120_0"

@sbrunk In the meantime, we can work around that by calling Loader.load(cusolver.class).

sbrunk commented 1 year ago

Thanks @saudet for looking into it and for the workaround.

For reference, the workaround needs cuda-platform/cuda in addtion to cuda-redist to have cusolver on the classpath. It also wants to load native bindings like libjnicudart.

Also since I have to support CPU only builds as well cusolver might not be available in my case, so I'm checking if it's on the classpath:

  try {
    val cusolver = Class.forName("org.bytedeco.cuda.global.cusolver")
    Loader.load(cusolver)
  } catch {
    case e: ClassNotFoundException => // ignore to avoid breaking CPU only builds
  }

The only downside is that I need to ensure this is run before any tensor operations. I guess the only way to avoid this is to actually have a patched presets/torch.java to reliably trigger JavaCPPs loading magic, right?

saudet commented 1 year ago

You don't have a "common/utils/whatever" class in which you can put stuff like that in a static { }... ?

sbrunk commented 1 year ago

I do, but I wasn't able to find a way to always trigger loading of that utils class containing the static block before doing native calls, due to a combination of me using Scala top-level methods and Scala being really bad with static methods.

I've refactored my code now in a way that should trigger the cusolver loading reliably before calling any native code.

sbrunk commented 1 year ago

This is now fixed via https://github.com/bytedeco/javacpp-presets/pull/1360