Closed DerLorenz closed 1 year ago
Hi Lorenz,
This is strange. The check you did, however, is incorrect. Do you mind running the following and letting me know the output?
import torch
print(torch.cuda.is_available())
Note how torch.cuda.is_available()
has to be run as a function.
Best, Kiarash.
Dear Kiarash,
Thanks for your quick reply! Oh sorry for my mistake here. Here is the correct check.
>>> import torch
>>> print(torch.cuda.is_available())
False
I see there is an issue with torch, though I am not sure why. I was able to succesfully install model angelo on my workstation before. Here is a list of all packages in my conda enviroment.
# packages in environment at /path/to/Anaconda3/envs/model_angelo:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
biopython 1.81 pypi_0 pypi
blas 1.0 mkl
brotlipy 0.7.0 py39h27cfd23_1003
bzip2 1.0.8 h7b6447c_0
ca-certificates 2023.05.30 h06a4308_0
certifi 2023.7.22 py39h06a4308_0
cffi 1.15.1 py39h5eee18b_3
charset-normalizer 2.0.4 pyhd3eb1b0_0
contourpy 1.1.0 pypi_0 pypi
cryptography 41.0.2 py39h22a60cf_0
cuda-cudart 11.7.99 0 nvidia
cuda-cupti 11.7.101 0 nvidia
cuda-libraries 11.7.1 0 nvidia
cuda-nvrtc 11.7.99 0 nvidia
cuda-nvtx 11.7.91 0 nvidia
cuda-runtime 11.7.1 0 nvidia
cudatoolkit 11.7.0 hd8887f6_10 nvidia
cycler 0.11.0 pypi_0 pypi
einops 0.6.1 pypi_0 pypi
fair-esm 2.0.0 pypi_0 pypi
ffmpeg 4.3 hf484d3e_0 pytorch
filelock 3.9.0 py39h06a4308_0
fonttools 4.41.1 pypi_0 pypi
freetype 2.12.1 h4a9f257_0
giflib 5.2.1 h5eee18b_3
gmp 6.2.1 h295c915_3
gmpy2 2.1.2 py39heeb90bb_0
gnutls 3.6.15 he1e5248_0
idna 3.4 py39h06a4308_0
importlib-resources 6.0.0 pypi_0 pypi
intel-openmp 2023.1.0 hdb19cb5_46305
jinja2 3.1.2 py39h06a4308_0
jpeg 9e h5eee18b_1
kiwisolver 1.4.4 pypi_0 pypi
lame 3.100 h7b6447c_0
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.38 h1181459_1
lerc 3.0 h295c915_0
libcublas 11.10.3.66 0 nvidia
libcufft 10.7.2.124 h4fbf590_0 nvidia
libcufile 1.7.1.12 0 nvidia
libcurand 10.3.3.129 0 nvidia
libcusolver 11.4.0.1 0 nvidia
libcusparse 11.7.4.91 0 nvidia
libdeflate 1.17 h5eee18b_0
libffi 3.4.4 h6a678d5_0
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libiconv 1.16 h7f8727e_2
libidn2 2.3.4 h5eee18b_0
libnpp 11.7.4.75 0 nvidia
libnvjpeg 11.8.0.2 0 nvidia
libpng 1.6.39 h5eee18b_0
libstdcxx-ng 11.2.0 h1234567_1
libtasn1 4.19.0 h5eee18b_0
libtiff 4.5.0 h6a678d5_2
libunistring 0.9.10 h27cfd23_0
libwebp 1.2.4 h11a3e52_1
libwebp-base 1.2.4 h5eee18b_1
loguru 0.7.0 pypi_0 pypi
lz4-c 1.9.4 h6a678d5_0
markupsafe 2.1.1 py39h7f8727e_0
matplotlib 3.7.2 pypi_0 pypi
mkl 2023.1.0 h6d00ec8_46342
mkl-service 2.4.0 py39h5eee18b_1
mkl_fft 1.3.6 py39h417a72b_1
mkl_random 1.2.2 py39h417a72b_1
model-angelo 1.0.1 pypi_0 pypi
mpc 1.1.0 h10f8cd9_1
mpfr 4.0.2 hb69a4c5_1
mpmath 1.3.0 py39h06a4308_0
mrcfile 1.4.3 pypi_0 pypi
ncurses 6.4 h6a678d5_0
nettle 3.7.3 hbbd107a_1
networkx 3.1 py39h06a4308_0
numpy 1.25.0 py39h5f9d8c6_0
numpy-base 1.25.0 py39hb5e798b_0
openh264 2.1.1 h4ff587b_0
openssl 3.0.9 h7f8727e_0
packaging 23.1 pypi_0 pypi
pandas 2.0.3 pypi_0 pypi
pillow 9.4.0 py39h6a678d5_0
pip 23.2.1 py39h06a4308_0
psutil 5.9.5 pypi_0 pypi
pycparser 2.21 pyhd3eb1b0_0
pyhmmer 0.8.2 pypi_0 pypi
pyopenssl 23.2.0 py39h06a4308_0
pyparsing 3.0.9 pypi_0 pypi
pysocks 1.7.1 py39h06a4308_0
python 3.9.17 h955ad1f_0
python-dateutil 2.8.2 pypi_0 pypi
pytorch 2.0.1 py3.9_cuda11.7_cudnn8.5.0_0 pytorch
pytorch-cuda 11.7 h778d358_5 pytorch
pytorch-mutex 1.0 cuda pytorch
pytz 2023.3 pypi_0 pypi
readline 8.2 h5eee18b_0
requests 2.31.0 py39h06a4308_0
scipy 1.11.1 pypi_0 pypi
setuptools 68.0.0 py39h06a4308_0
six 1.16.0 pypi_0 pypi
sqlite 3.41.2 h5eee18b_0
sympy 1.11.1 py39h06a4308_0
tbb 2021.8.0 hdb19cb5_0
tk 8.6.12 h1ccaba5_0
torchaudio 2.0.2 py39_cu117 pytorch
torchtriton 2.0.0 py39 pytorch
torchvision 0.15.2 py39_cu117 pytorch
tqdm 4.65.0 pypi_0 pypi
typing_extensions 4.7.1 py39h06a4308_0
tzdata 2023.3 pypi_0 pypi
urllib3 1.26.16 py39h06a4308_0
wheel 0.38.4 py39h06a4308_0
xz 5.4.2 h5eee18b_0
zipp 3.16.2 pypi_0 pypi
zlib 1.2.13 h5eee18b_0
zstd 1.5.5 hc292b87_0
This one I installed as suggested in isssuehttps://github.com/3dem/model-angelo/issues/24 with some adaption:
$ conda create -n model_angelo python=3.9 -y
$ conda activate model_angelo
(model_angelo) $ conda install -y pytorch pytorch-cuda=11.7 torchvision torchaudio cudatoolkit=11.7 -c nvidia -c pytorch
(model_angelo) $ python3 -m pip install -r requirements.txt
(model_angelo) $ python3 setup.py install
(model_angelo) $ export TORCH_HOME=/path/to/weights
(model_angelo) $ conda env config vars set TORCH_HOME="$TORCH_HOME"
(model_angelo) $ conda deactivate && conda activate model_angelo
I already tried before to install it the way(s) suggested in the README with the same outcome of the C-alpha prediction taking forever (minutes/iteration) and CUDA not being available.
Best, Lorenz
I reinstalled model_angelo again following the readme instructions and just changed the install script to check for and create the enviroment model_angelo_1. Again here is the check:
`>>> import torch
print(torch.cuda.is_available()) False`
Also here is the list of installed packages:
`# packages in environment at /software/extra/the_real_lorenz/Anaconda3/envs/model_angelo_1:
#
# Name Version Build Channel
_libgcc_mutex 0.1 main
_openmp_mutex 5.1 1_gnu
biopython 1.81 pypi_0 pypi
blas 1.0 mkl
brotlipy 0.7.0 py310h7f8727e_1002
bzip2 1.0.8 h7b6447c_0
ca-certificates 2023.05.30 h06a4308_0
certifi 2023.7.22 py310h06a4308_0
cffi 1.15.1 py310h5eee18b_3
charset-normalizer 2.0.4 pyhd3eb1b0_0
contourpy 1.1.0 pypi_0 pypi
cryptography 41.0.2 py310h22a60cf_0
cuda-cudart 11.7.99 0 nvidia
cuda-cupti 11.7.101 0 nvidia
cuda-libraries 11.7.1 0 nvidia
cuda-nvrtc 11.7.99 0 nvidia
cuda-nvtx 11.7.91 0 nvidia
cuda-runtime 11.7.1 0 nvidia
cycler 0.11.0 pypi_0 pypi
einops 0.6.1 pypi_0 pypi
fair-esm 2.0.0 pypi_0 pypi
ffmpeg 4.3 hf484d3e_0 pytorch
filelock 3.9.0 py310h06a4308_0
fonttools 4.41.1 pypi_0 pypi
freetype 2.12.1 h4a9f257_0
giflib 5.2.1 h5eee18b_3
gmp 6.2.1 h295c915_3
gmpy2 2.1.2 py310heeb90bb_0
gnutls 3.6.15 he1e5248_0
idna 3.4 py310h06a4308_0
intel-openmp 2023.1.0 hdb19cb5_46305
jinja2 3.1.2 py310h06a4308_0
jpeg 9e h5eee18b_1
kiwisolver 1.4.4 pypi_0 pypi
lame 3.100 h7b6447c_0
lcms2 2.12 h3be6417_0
ld_impl_linux-64 2.38 h1181459_1
lerc 3.0 h295c915_0
libcublas 11.10.3.66 0 nvidia
libcufft 10.7.2.124 h4fbf590_0 nvidia
libcufile 1.7.1.12 0 nvidia
libcurand 10.3.3.129 0 nvidia
libcusolver 11.4.0.1 0 nvidia
libcusparse 11.7.4.91 0 nvidia
libdeflate 1.17 h5eee18b_0
libffi 3.4.4 h6a678d5_0
libgcc-ng 11.2.0 h1234567_1
libgomp 11.2.0 h1234567_1
libiconv 1.16 h7f8727e_2
libidn2 2.3.4 h5eee18b_0
libnpp 11.7.4.75 0 nvidia
libnvjpeg 11.8.0.2 0 nvidia
libpng 1.6.39 h5eee18b_0
libstdcxx-ng 11.2.0 h1234567_1
libtasn1 4.19.0 h5eee18b_0
libtiff 4.5.0 h6a678d5_2
libunistring 0.9.10 h27cfd23_0
libuuid 1.41.5 h5eee18b_0
libwebp 1.2.4 h11a3e52_1
libwebp-base 1.2.4 h5eee18b_1
loguru 0.7.0 pypi_0 pypi
lz4-c 1.9.4 h6a678d5_0
markupsafe 2.1.1 py310h7f8727e_0
matplotlib 3.7.2 pypi_0 pypi
mkl 2023.1.0 h6d00ec8_46342
mkl-service 2.4.0 py310h5eee18b_1
mkl_fft 1.3.6 py310h1128e8f_1
mkl_random 1.2.2 py310h1128e8f_1
model-angelo 1.0.1 pypi_0 pypi
mpc 1.1.0 h10f8cd9_1
mpfr 4.0.2 hb69a4c5_1
mpmath 1.3.0 py310h06a4308_0
mrcfile 1.4.3 pypi_0 pypi
ncurses 6.4 h6a678d5_0
nettle 3.7.3 hbbd107a_1
networkx 3.1 py310h06a4308_0
numpy 1.25.0 py310h5f9d8c6_0
numpy-base 1.25.0 py310hb5e798b_0
openh264 2.1.1 h4ff587b_0
openssl 3.0.9 h7f8727e_0
packaging 23.1 pypi_0 pypi
pandas 2.0.3 pypi_0 pypi
pillow 9.4.0 py310h6a678d5_0
pip 23.2.1 py310h06a4308_0
psutil 5.9.5 pypi_0 pypi
pycparser 2.21 pyhd3eb1b0_0
pyhmmer 0.8.2 pypi_0 pypi
pyopenssl 23.2.0 py310h06a4308_0
pyparsing 3.0.9 pypi_0 pypi
pysocks 1.7.1 py310h06a4308_0
python 3.10.12 h955ad1f_0
python-dateutil 2.8.2 pypi_0 pypi
pytorch 2.0.1 py3.10_cuda11.7_cudnn8.5.0_0 pytorch
pytorch-cuda 11.7 h778d358_5 pytorch
pytorch-mutex 1.0 cuda pytorch
pytz 2023.3 pypi_0 pypi
readline 8.2 h5eee18b_0
requests 2.31.0 py310h06a4308_0
scipy 1.11.1 pypi_0 pypi
setuptools 68.0.0 py310h06a4308_0
six 1.16.0 pypi_0 pypi
sqlite 3.41.2 h5eee18b_0
sympy 1.11.1 py310h06a4308_0
tbb 2021.8.0 hdb19cb5_0
tk 8.6.12 h1ccaba5_0
torchaudio 2.0.2 py310_cu117 pytorch
torchtriton 2.0.0 py310 pytorch
torchvision 0.15.2 py310_cu117 pytorch
tqdm 4.65.0 pypi_0 pypi
typing_extensions 4.7.1 py310h06a4308_0
tzdata 2023.3 pypi_0 pypi
urllib3 1.26.16 py310h06a4308_0
wheel 0.38.4 py310h06a4308_0
xz 5.4.2 h5eee18b_0
zlib 1.2.13 h5eee18b_0
zstd 1.5.5 hc292b87_0 `
Maybe that helps. CUDA normally works fine on our cluster with other softwares.
RIght, so the new environment definitely has pytorch with cuda installed. When you run the check, in the same environment, are you able to run nvidia-smi
? Does it show your GPUs?
Just with the activated enviroment doesnt show me the gpus.
nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
When activating the enviroment with gpus allocated using tmux, I get a different error:
nvidia-smi
No devices were found
Wait, I am an idiot. I found the issue in my slurm-submission script. While asking for resources allocated on the gpu nodes, I didn't actually get a gpu allocated. I fixed ths now and I get all correct outputs for the checks.
>>> import torch
>>> print(torch.cuda.is_available())
True
and
nvidia-smi
Thu Aug 3 10:06:03 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.85.12 Driver Version: 525.85.12 CUDA Version: 12.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... On | 00000000:00:08.0 Off | 0 |
| N/A 25C P0 23W / 250W | 0MiB / 12288MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------
I restarted the predictions and it should now run as expected. I will give a short update once the job started running.
Excellent! Was just typing a reply and saw your message changed :)
Let me know when to mark this issue as resolved!
Yes, it works now. Sorry for my stupidity and bothering you in the first place. Thank you very much (and congratulations) for this awesome software and the support! The initial version already served me well and I am now looking forward seeing the rRNA prediction results.
No, problem! I'm glad things are running now :)
Hi,
I installed model angelo on our HPC following the README instruction using Anaconda3. When trying to run a prediction (submitted via slurm), the C-alpha prediction takes very long. Checking the slurm.out file, I see that cuda would not be available. Thus, i have no GPU usage during C-alpha prediction.
Checking torch availability seems ok.
I reinstalled it several times following the personal use or the sahred computational enviroment instructions. Also, I tried different workarounds suggested in previous threads (https://github.com/3dem/model-angelo/issues/24), but nothing works.
I am a bit confused and not sure what I could do to fix it. Thus, any help would be very much appreciated!
Best, Lorenz