Closed sbassi closed 1 year ago
Same when running one of the sample programs. This is the code:
import torch
import esm
model = esm.pretrained.esmfold_v1()
model = model.eval().cuda()
# Optionally, uncomment to set a chunk size for axial attention. This can help reduce memory.
# Lower sizes will have lower memory requirements at the cost of increased speed.
# model.set_chunk_size(128)
sequence = "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG"
# Multimer prediction can be done with chains separated by ':'
with torch.no_grad():
output = model.infer_pdb(sequence)
with open("result.pdb", "w") as f:
f.write(output)
import biotite.structure.io as bsio
struct = bsio.load_structure("result.pdb", extra_fields=["b_factor"])
print(struct.b_factor.mean()) # this will be the pLDDT
# 88.3
And here is the run:
(esm) ubuntu@ip-10-0-0-77:~/esm$ python p2.py
Traceback (most recent call last):
File "/home/ubuntu/esm/p2.py", line 15, in <module>
output = model.infer_pdb(sequence)
File "/home/ubuntu/esm/esm/esmfold/v1/esmfold.py", line 305, in infer_pdb
return self.infer_pdbs([sequence], *args, **kwargs)[0]
File "/home/ubuntu/esm/esm/esmfold/v1/esmfold.py", line 300, in infer_pdbs
output = self.infer(seqs, *args, **kwargs)
File "/home/ubuntu/miniconda3/envs/esm/lib/python3.9/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/home/ubuntu/esm/esm/esmfold/v1/esmfold.py", line 277, in infer
output = self.forward(
File "/home/ubuntu/esm/esm/esmfold/v1/esmfold.py", line 156, in forward
esm_s = self._compute_language_model_representations(esmaa)
File "/home/ubuntu/esm/esm/esmfold/v1/esmfold.py", line 103, in _compute_language_model_representations
res = self.esm(
File "/home/ubuntu/miniconda3/envs/esm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/esm/esm/model/esm2.py", line 112, in forward
x, attn = layer(
File "/home/ubuntu/miniconda3/envs/esm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/esm/esm/modules.py", line 125, in forward
x, attn = self.self_attn(
File "/home/ubuntu/miniconda3/envs/esm/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1194, in _call_impl
return forward_call(*input, **kwargs)
File "/home/ubuntu/esm/esm/multihead_attention.py", line 357, in forward
attn_weights = torch.bmm(q, k.transpose(1, 2))
RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when calling `cublasGemmStridedBatchedExFix( handle, opa, opb, m, n, k, (void*)(&falpha), a, CUDA_R_16F, lda, stridea, b, CUDA_R_16F, ldb, strideb, (void*)(&fbeta), c, CUDA_R_16F, ldc, stridec, num_batches, CUDA_R_32F, CUBLAS_GEMM_DEFAULT_TENSOR_OP)`
Hi @sbassi , could you please provide your CUDA and pytorch version numbers?
Here is according to the AMI:
PyTorch 1.11.0 (Ubuntu 20.04) CUDA version: 11.5 NVIDIA driver version: 510.47.03
And this is what I actually using:
CUDA:
(esm) ubuntu@ip-10-xx:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0
Pytorch:
(esm) ubuntu@ip-10-0-0-77:~$ python
Python 3.9.16 | packaged by conda-forge | (main, Feb 1 2023, 21:39:03)
[GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> print(torch.__version__)
1.13.1+cu117
From this listing, which one do you recommend? https://docs.aws.amazon.com/dlami/latest/devguide/appendix-ami-release-notes.html
MOTD:
Looks like it was my error since I made my own conda env instead of using the pre built environment that is ran with "source activate pytorch". Now I am using a new AMI (Deep Learning AMI GPU PyTorch 1.12.1 (Ubuntu 20.04) 20220926) and trying this conda env, and worked, so I will close this issue. Thank you very much for your help.
Bug description
I try the esmfold_inference.py script with the included data and got a CUDA error.
Reproduction steps After installing the program following the instructions in the repo, I run this:
Expected behavior Get no error.
Logs
Additional context Ran in a g4dn.2xl EC2 instance that has 32Gb RAM and everything installed without any error. Also: