facebookresearch / esm

Evolutionary Scale Modeling (esm): Pretrained language models for proteins
MIT License
3.26k stars 643 forks source link

Potential source of GPU memory leak in `ESMFold` #543

Closed amorehead closed 1 year ago

amorehead commented 1 year ago

Hello.

I have recently been testing ESMFold's Attention module for separate use cases, and I believe I discovered a (potential) source of GPU memory leaks. In my testing, while monitoring the ratio of current GPU memory allocated to max GPU memory historically allocated via print(f"GPU memory ratio: {torch.cuda.memory_allocated() / torch.cuda.max_memory_allocated()}"), I notice that unless I change q = self.rescale_factor * q to q *= self.rescale_factor I experience an out-of-memory error in PyTorch during the backward pass after approximately 500 training steps (in my particular use case). Would anyone happen to have some insights as to why this might occur in specific use cases, or is it possible that this phenomenon affects ESMFold generally speaking?

https://github.com/facebookresearch/esm/blob/c9c7d4f0fec964ce10c3e11dccec6c16edaa5144/esm/esmfold/v1/misc.py#L188

amorehead commented 1 year ago

False alarm. I discovered that my out-of-memory issue was (most likely) being caused by an external issue, so I believe this issue is no longer relevant (or valid) to the ESM repository.