agemagician / ProtTrans

ProtTrans is providing state of the art pretrained language models for proteins. ProtTrans was trained on thousands of GPUs from Summit and hundreds of Google TPUs using Transformers Models.
Academic Free License v3.0
1.12k stars 153 forks source link

Cuda out of memory #77

Closed CNwangbin closed 2 years ago

CNwangbin commented 2 years ago

image The process always occur out of memory whether if I gc.collect() in for loop.

zhangzhenhuajack commented 2 years ago

try reduce batchsize

CNwangbin commented 2 years ago

try reduce batchsize

the batchsize is 1, cuda memory slowly increase.

mheinzinger commented 2 years ago

If you are already using batch_size=1, then you can try the following:

These two changes allowed us to embed proteins up to a few thousand residues using the encoder of ProtT5 in half-precision on a GPU with 8GB vRAM.

If you have a GPU with less than 8GB vRAM, you could also check our new colab-notebook which allows you to generate embeddings on google's colab: https://colab.research.google.com/drive/1TUj-ayG3WO52n5N50S7KH9vtt6zRkdmj?usp=sharing

By the way: we usually even embed proteins longer than 1024 (so you could remove slicing off residues after 1024). ProtT5 has a learnt positional encoding so you should even get meaningful embeddings for proteins longer than 1024 (though, this is something you might want to confirm for your problem).

CNwangbin commented 2 years ago

If you are already using batch_size=1, then you can try the following:

  • Depending on what you actually want to do, you do not need to run the whole ProtT5 model. ProtT5 consists of an Encoder and a Decoder. However, we've realized (and this aligns with results from NLP if I'm not mistaken) that if you only want to generate embeddings for downstream prediction tasks, running only the Encoder is sufficient. The Decoder is usually only needed for generation tasks. We use only the Encoder-Part of ProtT5 throughout all our experiments. This is conveniently handled by simply using the corresponding Huggingface Interface (T5EncoderModel) which strips off the Decoder for you. See an example here: https://github.com/agemagician/ProtTrans/blob/master/Embedding/PyTorch/Advanced/ProtT5-XL-UniRef50.ipynb
  • Use the model in half-precision; simply cast the model to fp16 via model=model.half()

These two changes allowed us to embed proteins up to a few thousand residues using the encoder of ProtT5 in half-precision on a GPU with 8GB vRAM.

If you have a GPU with less than 8GB vRAM, you could also check our new colab-notebook which allows you to generate embeddings on google's colab: https://colab.research.google.com/drive/1TUj-ayG3WO52n5N50S7KH9vtt6zRkdmj?usp=sharing

By the way: we usually even embed proteins longer than 1024 (so you could remove slicing off residues after 1024). ProtT5 has a learnt positional encoding so you should even get meaningful embeddings for proteins longer than 1024 (though, this is something you might want to confirm for your problem).

Thanks,it actually helped me a lot.