Open RichenLee opened 8 months ago
I came across a similar question, also waiting for an answer
I am also getting CUDA memory issues, sometimes for quite short sequences (few hundred)
@RichenLee For "what approximate memory capacity would be necessary to successfully predict the structure of a protein with 1200 amino acids" I think this benchmark(NVIDIA V100 vs A800 vs H800) will help you alot.
@RichenLee For "what approximate memory capacity would be necessary to successfully predict the structure of a protein with 1200 amino acids" I think this benchmark(NVIDIA V100 vs A800 vs H800) will help you alot.
Thank you for your suggestion and the information! Based on the benchmarks you provided, it seems that my RTX 4090 may not be sufficient alone for predicting the structure of a protein with 1200 amino acids. Looking forward to future technologies that allow for shared memory across multiple GPUs to handle such large-scale computational tasks. Thanks again for your help!
I'm running the same problem when modeling a homo-dimer with each having a small substrate and a NAD+ on a 40G A100. While I still haven't resolve the problem, it seems that this https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html is what you @RichenLee asked about multiple GPU task.
Issue Description:
I encountered a CUDA out of memory error when trying to predict a protein containing 1200 amino acids. It's worth noting that I did not encounter this issue when running the provided example, leading me to believe it may not be related to environmental configuration.The error message is as follows:
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1.23 GiB (GPU 0; 23.65 GiB total capacity; 22.00 GiB already allocated; 241.06 MiB free; 22.95 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
I've attempted to adjust the PYTORCH_CUDA_ALLOC_CONF environment variable based on the suggestion in the error message, ranging from 32 to 1024, but it hasn't resolved the issue. Additionally, I tried adjusting the loader_params.MAXCYCLE parameter, setting it to 1, but this also did not solve the problem.
System Information: Operating System: Ubuntu 20.04 GPU Model: NVIDIA GeForce RTX 4090 24GB PyTorch Version: 2.0.1 CUDA Version: 11.8
Considering that the provided example runs smoothly without encountering memory issues, I'm curious to know what approximate memory capacity would be necessary to successfully predict the structure of a protein with 1200 amino acids.