csguoh / MambaIR

[ECCV2024] An official pytorch implement of the paper "MambaIR: A simple baseline for image restoration with state-space model".
Apache License 2.0
422 stars 36 forks source link

CUDA Unavailable in a container pytorch/pytorch:2.0.1-cuda11.7-cudnn8-devel #37

Open bryanbocao opened 3 months ago

bryanbocao commented 3 months ago

I tried to use CUDA in a docker container pytorch/pytorch:2.0.1-cuda11.7-cudnn8-devel but the driver seems incorrect:

(mambair) root@3d7e0ba526d4:/share/eData05/brcao/Repos/MambaIR# python -c "import torch; print(torch.__version__)"
2.0.1
(mambair) root@3d7e0ba526d4:/share/eData05/brcao/Repos/MambaIR# cd /share/eData05/brcao/Repos/MambaIR^C
(mambair) root@3d7e0ba526d4:/share/eData05/brcao/Repos/MambaIR# nvidia-smi
Thu Jul  4 21:19:01 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03   Driver Version: 470.161.03   CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:04:00.0  On |                  N/A |
| 30%   44C    P8    32W / 350W |    913MiB / 24234MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:0A:00.0 Off |                  N/A |
| 30%   45C    P8    21W / 350W |     15MiB / 24268MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
+-----------------------------------------------------------------------------+

(mambair) root@3d7e0ba526d4:/share/eData05/brcao/Repos/MambaIR# nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Jun__8_16:49:14_PDT_2022
Cuda compilation tools, release 11.7, V11.7.99
Build cuda_11.7.r11.7/compiler.31442593_0

(mambair) root@3d7e0ba526d4:/share/eData05/brcao/Repos/MambaIR# python -c "import torch; print(torch.cuda.is_available())"
/opt/conda/envs/mambair/lib/python3.8/site-packages/torch/cuda/__init__.py:107: UserWarning: CUDA initialization: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW (Triggered internally at /opt/conda/conda-bld/pytorch_1682343962757/work/c10/cuda/CUDAFunctions.cpp:109.)
  return torch._C._cuda_getDeviceCount() > 0
False

It would be great if anyone can help. Thanks!

csguoh commented 3 months ago

From the above error, you may try import torch; print(torch.cuda.is_available()). If the output is 0, it indicates there may be a mismatch between your cuda and pytorch version.

bryanbocao commented 3 months ago

@csguoh

import torch; print(torch.cuda.is_available())

Tried that (already in the comment). It says False.