Out of memory - upper limit on sequence length?

jennastanislaw commented 1 year ago

Hello,

I have been trying to predict the structure of a 12-chain complex (total length of all chains together is 2256), but I am running out of memory: RuntimeError: CUDA out of memory. Tried to allocate 12.22 GiB (GPU 0; 47.46 GiB total capacity; 37.08 GiB already allocated; 9.40 GiB free; 37.32 GiB reserved in total by PyTorch). I am using a 48GB GPU, on which I was able to successfully run a smaller complex also with 12 chains (total length ~1800). I tried decreasing chunk size and adjusting max_split_size_mb as suggested, but I continue to run out of memory.

Are there other options which I can vary or things that I can try, or is the sequence simply too long? If the latter is the case, is there an estimation of the upper limit of sequence length that I would be able to run on a 48GB GPU?

I recognize this issue is quite similar to Issue #407, but my sequence is much smaller and my GPU memory is much larger, so I wanted to open a separate issue as there are some differences. Any tips are appreciated. Thank you very much!

nikita-smetanin commented 1 year ago

Hi @jennastanislaw please try specifying use_lma=True to enable low memory attention implementation here – https://github.com/facebookresearch/esm/blob/main/esm/esmfold/v1/tri_self_attn_block.py#L151 (both for start and end nodes) – this should reduce the peak memory footprint, however, it's hard to tell max sequence length in your case as there are many differents ops in play in addition to other factors like memory fragmentation etc.

honestAnt commented 1 year ago

You can use the parameter: --chunk-size 16 to solve the problem of gpu memory。 I simply tested this -- chunk-size parameter on the server (sample data sequence length 870), and found that when the value is 16 and 128, the gpu memory usage is about 17GB; when the value is 512, the gpu memory is about 27GB

prubach commented 1 year ago

Hi, I'm running into CUDA out of memory errors on much smaller sequences (length 651). I have 2 GPUs with 12GB each, is there any way to force ESM to use both of them?

I tried lowering the chunk-size even to 1 and --max-tokens-per-batch to 1. esm-fold --max-tokens-per-batch 1 --chunk-size 1 -i esm_af_00000.fasta -o pdbs

honestAnt commented 1 year ago

At present, it seems that only a single card can be used for prediction, and the advantage of multiple cards is not great; in addition to the --chunk-size parameter, there is also --cpu-offload (reload part of the model data to the cpu), --num-recycles The default value 4. Changing to 0 will reduce memory and time (the score will be lower, use with caution); in addition --chunk-size is not optimized for memory with a length of less than 800, and the effect of more than 1000 is more obvious, and too many chunks will take too long lengthen

facebookresearch / esm

Out of memory - upper limit on sequence length? #457