Closed gavinolukoju-inconnect closed 1 year ago
Closing this. Please file an issue on the main repo: https://github.com/deeplearning4j/deeplearning4j - please only do this after confirming there's an actual real issue with the latest version. M2.1. The upcoming M3 will be cuda 11.6/11.8. Cuda 12.x will come out as soon as the upstream javacpp support for cuda 12 is available.
If you need a custom build or guarantees on where the framework runs outside the 2 standard cuda versions please consider the support offering: https://www.konduit.ai/dl4j-consulting
Thanks @agibsonccc, I will post this issue on the main repo. Do you have an estimate on when M3 will be available?
@gavinolukoju-inconnect I'm hoping next week pending 1 customer issue and a bit more QA. No committed timelines though.
@agibsonccc Thanks, that's excellent news. Happy to try M3 as soon as it is available.
Issue Description
I have 2 graphics cards in my system a GTX 1070 and an RTX 4070, when I attempt to train my DL4J model on the RTX 4070 I encounter the runtime error below :
Exception in thread "main" java.lang.RuntimeException: cuSolver handle creation failed !; Error code: [7] at org.nd4j.nativeblas.Nd4jCuda.lcBlasHandle(Native Method) at org.nd4j.jita.handler.impl.CudaZeroHandler.getCudaCublasHandle(CudaZeroHandler.java:1020) at org.nd4j.jita.handler.impl.CudaZeroHandler.getCudaContext(CudaZeroHandler.java:1045) at org.nd4j.jita.handler.impl.CudaZeroHandler.getDeviceContext(CudaZeroHandler.java:1007) at org.nd4j.jita.allocator.impl.AtomicAllocator.getDeviceContext(AtomicAllocator.java:222) at org.nd4j.jita.flow.impl.SynchronousFlowController.prepareAction(SynchronousFlowController.java:228) at org.nd4j.jita.handler.impl.CudaZeroHandler.memcpyAsync(CudaZeroHandler.java:356) at org.nd4j.jita.allocator.impl.AtomicAllocator.memcpyAsync(AtomicAllocator.java:897) at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.set(BaseCudaDataBuffer.java:1045) at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.setData(BaseCudaDataBuffer.java:1085) at org.nd4j.linalg.factory.Nd4j.createTypedBuffer(Nd4j.java:1603) at org.nd4j.linalg.jcublas.JCublasNDArrayFactory.create(JCublasNDArrayFactory.java:1458) at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:3818)
This error does not occur if I train my model on the GTX 1070.
Please describe your issue, along with:
expected behavior Normal execution without the above stacktrace when using an RTX 4070 either standalone or in parallel with another CUDA capable graphics card.
encountered behavior A runtime stacktrace when training with the RTX 4070
Exception in thread "main" java.lang.RuntimeException: cuSolver handle creation failed !; Error code: [7] at org.nd4j.nativeblas.Nd4jCuda.lcBlasHandle(Native Method) at org.nd4j.jita.handler.impl.CudaZeroHandler.getCudaCublasHandle(CudaZeroHandler.java:1020) at org.nd4j.jita.handler.impl.CudaZeroHandler.getCudaContext(CudaZeroHandler.java:1045) at org.nd4j.jita.handler.impl.CudaZeroHandler.getDeviceContext(CudaZeroHandler.java:1007) at org.nd4j.jita.allocator.impl.AtomicAllocator.getDeviceContext(AtomicAllocator.java:222) at org.nd4j.jita.flow.impl.SynchronousFlowController.prepareAction(SynchronousFlowController.java:228) at org.nd4j.jita.handler.impl.CudaZeroHandler.memcpyAsync(CudaZeroHandler.java:356) at org.nd4j.jita.allocator.impl.AtomicAllocator.memcpyAsync(AtomicAllocator.java:897) at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.set(BaseCudaDataBuffer.java:1045) at org.nd4j.linalg.jcublas.buffer.BaseCudaDataBuffer.setData(BaseCudaDataBuffer.java:1085) at org.nd4j.linalg.factory.Nd4j.createTypedBuffer(Nd4j.java:1603) at org.nd4j.linalg.jcublas.JCublasNDArrayFactory.create(JCublasNDArrayFactory.java:1458) at org.nd4j.linalg.factory.Nd4j.create(Nd4j.java:3818)
Version Information
Please indicate relevant versions, including, if relevant:
Contributing
If you'd like to help us fix the issue by contributing some code, but would like guidance or help in doing so, please mention it!