CUDA out of memory error for long sequences

facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins

MIT License

3.26k stars 643 forks source link

CUDA out of memory error for long sequences #407

Closed gbrown40 closed 1 year ago

gbrown40 commented 1 year ago

When I try to run the esm model with large sequences( > 4700), I get error: RuntimeError: CUDA out of memory. Tried to allocate 746.00 MiB (GPU 0; 15.78 GiB total capacity; 12.68 GiB already allocated; 718.75 MiB free; 13.63 GiB reserved in total by PyTorch) I have tried setting the chunk size all the way down to 1 with no improvements. I'm wondering if there are any other ways to reduce memory usage for large sequences.

tomsercu commented 1 year ago

4700 length on 16GB gpu simply won't fit I'm afraid. We're looking more memory efficient versions of ESMFold, but timeline to release is unclear

gdolsten commented 1 year ago

Where are the main memory bottlenecks coming from? Is it just the O(N^2) transformers? I am trying to run some of the smaller ESM-2 models on larger proteins and I am running out of memory on a 32G GPU.

tomsercu commented 1 year ago

For the LM yes it's the O(N^2) self-attention. For ESMFold there's the O(N^3) in axial attention, but the computation can be chopped into independent chunks to circumvent that, see model.set_chunk_size(128) instructions in frontpage README

Tom0515Lt commented 7 months ago

how to deal?Can i use 2gpus?

surajkrayon commented 6 months ago

Hi, facing same issue for a sequence of length 2180. using nvidia l4 x2 gpu (48)

already tried with multiple chunk sizes but still getting out of memory, is there any other way or will the gpu size here not suffice for this sequence length

reference code:

tokenizer = AutoTokenizer.from_pretrained("facebook/esmfold_v1") model = EsmForProteinFolding.from_pretrained("facebook/esmfold_v1")

model = model.cuda()

model.esm = model.esm.half() torch.backends.cuda.matmul.allow_tf32 = True model.trunk.set_chunk_size(8)

tokenized_input = tokenized_input.cuda()

model.eval() with torch.no_grad(): output = model(tokenized_input)

can you please help with this.