Closed gordoneliel closed 11 months ago
Hey @gordoneliel, do other models work for you?
[error] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
This usually means cuDNN is not installed.
A couple version checks that may be helpful:
# Verify CUDA version
nvcc --version
# Verify cuDNN version, make sure it's installed and that the package matches CUDA version
apt-cache policy libcudnn8 | head -n 3
# Check drivers and CUDA support
nvidia-smi
NVCC:
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Sep__8_19:17:24_PDT_2023
Cuda compilation tools, release 12.3, V12.3.52
Build cuda_12.3.r12.3/compiler.33281558_0
libcudnn8
libcudnn8:
Installed: (none)
Candidate: 8.9.5.29-1+cuda12.2
nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.113.01 Driver Version: 535.113.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:05:00.0 On | Off |
| 0% 42C P8 22W / 450W | 492MiB / 24564MiB | 5% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 2544 G /usr/lib/xorg/Xorg 199MiB |
| 0 N/A N/A 2653 G /usr/bin/gnome-shell 80MiB |
| 0 N/A N/A 3130 C+G ...5020629,13399252839037686317,262144 180MiB |
+---------------------------------------------------------------------------------------+
Build cuda_12.3.r12.3
Can you try downgrading to CUDA 12.2?
@jonatanklosko You wont believe this but I installed libcudnn9 via apt (thought I already had it?) and it started working!
Perfect, for completeness you mean libcudnn8, right?
Oh, I somehow missed it:
libcudnn8:
Installed: (none)
Candidate: 8.9.5.29-1+cuda12.2
See that it says installed none, so you had the repositories added, but it was indeed not installed :)
Ah, missed that too! Thanks for the command, really helpful!
Elixir version: 1.15.7 - OTP 26.1.2 Livebook version: 0.11.2 CUDA version: 12.3 GPU: RTX 4090
Trying out the stable diffusion examples and runnning into a CUDNN error on the "Text to image" section:
DNN library initialization failed. Look at the errors above for more details.
Not sure if its an OOM, gpu memory usage is 22.440Gi/23.988Gi (from nvtop) at the end of the error, but 24gb should be enough?