Potential Cuda issue with 4070 cards

gavinolukoju-inconnect commented 1 year ago

Issue Description

I have 2 graphics cards in my system a GTX 1070 and an RTX 4070, when I attempt to train my DL4J model on the RTX 4070 I encounter the runtime error below :

Exception in thread "main" java.lang.RuntimeException: cuSolver handle creation failed !; Error code: [7] at org.nd4j.nativeblas.Nd4jCuda.lcBlasHandle(Native Method) at org.nd4j.jita.handler.impl.CudaZeroHandler.getCudaCublasHandle(CudaZeroHandler.java:1020) at org.nd4j.jita.handler.impl.CudaZeroHandler.getCudaContext(CudaZeroHandler.java:1045) at org.nd4j.jita.handler.impl.CudaZeroHandler.getDeviceContext(CudaZeroHandler.java:1007) at org.nd4j.jita.allocator.impl.AtomicAllocator.getDeviceContext(AtomicAllocator.java:222) at org.nd4j.jita.flow.impl.SynchronousFlowController.prepareAction(SynchronousFlowController.java:228) at org.nd4j.jita.handler.impl.CudaZeroHandler.memcpyAsync(CudaZeroHandler.java:356) at org.nd4j.jita.allocator.impl.AtomicAllocator.memcpyAsync(AtomicAllocator.java:897) at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.set(BaseCudaDataBuffer.java:1045) at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.setData(BaseCudaDataBuffer.java:1085) at org.nd4j.linalg.factory.Nd4j.createTypedBuffer(Nd4j.java:1603) at org.nd4j.linalg.jcublas.JCublasNDArrayFactory.create(JCublasNDArrayFactory.java:1458) at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:3818)

This error does not occur if I train my model on the GTX 1070.

Please describe your issue, along with:

expected behavior Normal execution without the above stacktrace when using an RTX 4070 either standalone or in parallel with another CUDA capable graphics card.
encountered behavior A runtime stacktrace when training with the RTX 4070

Exception in thread "main" java.lang.RuntimeException: cuSolver handle creation failed !; Error code: [7] at org.nd4j.nativeblas.Nd4jCuda.lcBlasHandle(Native Method) at org.nd4j.jita.handler.impl.CudaZeroHandler.getCudaCublasHandle(CudaZeroHandler.java:1020) at org.nd4j.jita.handler.impl.CudaZeroHandler.getCudaContext(CudaZeroHandler.java:1045) at org.nd4j.jita.handler.impl.CudaZeroHandler.getDeviceContext(CudaZeroHandler.java:1007) at org.nd4j.jita.allocator.impl.AtomicAllocator.getDeviceContext(AtomicAllocator.java:222) at org.nd4j.jita.flow.impl.SynchronousFlowController.prepareAction(SynchronousFlowController.java:228) at org.nd4j.jita.handler.impl.CudaZeroHandler.memcpyAsync(CudaZeroHandler.java:356) at org.nd4j.jita.allocator.impl.AtomicAllocator.memcpyAsync(AtomicAllocator.java:897) at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.set(BaseCudaDataBuffer.java:1045) at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.setData(BaseCudaDataBuffer.java:1085) at org.nd4j.linalg.factory.Nd4j.createTypedBuffer(Nd4j.java:1603) at org.nd4j.linalg.jcublas.JCublasNDArrayFactory.create(JCublasNDArrayFactory.java:1458) at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:3818)

Version Information

  <dependency>
        <groupId>org.nd4j</groupId>
        <artifactId>nd4j-native-platform</artifactId>
        <version>1.0.0-M2.1</version>
   </dependency>
    <dependency>
        <groupId>org.nd4j</groupId>
        <artifactId>nd4j-cuda-11.6-platform</artifactId>
        <version>1.0.0-M2.1</version>
    </dependency>
    <dependency>
        <groupId>org.nd4j</groupId>
        <artifactId>nd4j-cuda-11.6-platform</artifactId>
        <version>1.0.0-M2.1</version>
    </dependency>
    <dependency>
        <groupId>org.deeplearning4j</groupId>
        <artifactId>deeplearning4j-nn</artifactId>
        <version>1.0.0-M2.1</version>
    </dependency>
    <dependency>
        <groupId>org.deeplearning4j</groupId>
        <artifactId>deeplearning4j-core</artifactId>
        <version>1.0.0-M2.1</version>
    </dependency>
    <dependency>
        <groupId>org.deeplearning4j</groupId>
        <artifactId>deeplearning4j-ui</artifactId>
        <version>1.0.0-M2.1</version>
    </dependency>
    <dependency>
        <groupId>org.deeplearning4j</groupId>
        <artifactId>deeplearning4j-cuda-11.0</artifactId>
        <version>1.0.0-M1.1</version>
    </dependency>
    <dependency>
        <groupId>org.nd4j</groupId>
        <artifactId>nd4j-backends</artifactId>
        <version>1.0.0-M2.1</version>
        <type>pom</type>
    </dependency>
    <dependency>
        <groupId>org.nd4j</groupId>
        <artifactId>nd4j-cuda-11.6</artifactId>
        <version>1.0.0-M2.1</version>
    </dependency>
    <dependency>
        <groupId>org.nd4j</groupId>
        <artifactId>nd4j-cuda-11.6</artifactId>
        <version>1.0.0-M2.1</version>
        <classifier>windows-x86_64-cudnn</classifier>
    </dependency>
    <dependency>
        <groupId>org.bytedeco</groupId>
        <artifactId>javacpp</artifactId>
        <version>1.5.8</version>
    </dependency>
    <dependency>
        <groupId>org.bytedeco</groupId>
        <artifactId>cuda</artifactId>
        <version>11.8-8.6-1.5.8</version>
    </dependency>
    <dependency>
        <groupId>org.bytedeco</groupId>
        <artifactId>cuda-platform</artifactId>
        <version>11.8-8.6-1.5.8</version>
    </dependency>
    <dependency>
        <groupId>org.bytedeco</groupId>
        <artifactId>cuda-platform-redist</artifactId>
        <version>11.8-8.6-1.5.8</version>
    </dependency>

Please indicate relevant versions, including, if relevant:

Deeplearning4j version 1.0.0-M2.1
platform information (OS, etc) Windows 11
CUDA version, if used v11.0
NVIDIA driver version, if in use 531.79 Release Date 05/02/2023

Contributing

If you'd like to help us fix the issue by contributing some code, but would like guidance or help in doing so, please mention it!

agibsonccc commented 1 year ago

Closing this. Please file an issue on the main repo: https://github.com/deeplearning4j/deeplearning4j - please only do this after confirming there's an actual real issue with the latest version. M2.1. The upcoming M3 will be cuda 11.6/11.8. Cuda 12.x will come out as soon as the upstream javacpp support for cuda 12 is available.

If you need a custom build or guarantees on where the framework runs outside the 2 standard cuda versions please consider the support offering: https://www.konduit.ai/dl4j-consulting

gavinolukoju-inconnect commented 1 year ago

Thanks @agibsonccc, I will post this issue on the main repo. Do you have an estimate on when M3 will be available?

agibsonccc commented 1 year ago

@gavinolukoju-inconnect I'm hoping next week pending 1 customer issue and a bit more QA. No committed timelines though.

gavinolukoju-inconnect commented 1 year ago

@agibsonccc Thanks, that's excellent news. Happy to try M3 as soon as it is available.

deeplearning4j / deeplearning4j-examples