Cuda out of memory - Githubissues

CNwangbin commented 2 years ago

The process always occur out of memory whether if I gc.collect() in for loop.

zhangzhenhuajack commented 2 years ago

try reduce batchsize

CNwangbin commented 2 years ago

try reduce batchsize

the batchsize is 1, cuda memory slowly increase.

mheinzinger commented 2 years ago

If you are already using batch_size=1, then you can try the following:

Depending on what you actually want to do, you do not need to run the whole ProtT5 model. ProtT5 consists of an Encoder and a Decoder. However, we've realized (and this aligns with results from NLP if I'm not mistaken) that if you only want to generate embeddings for downstream prediction tasks, running only the Encoder is sufficient. The Decoder is usually only needed for generation tasks. We use only the Encoder-Part of ProtT5 throughout all our experiments. This is conveniently handled by simply using the corresponding Huggingface Interface (T5EncoderModel) which strips off the Decoder for you. See an example here: https://github.com/agemagician/ProtTrans/blob/master/Embedding/PyTorch/Advanced/ProtT5-XL-UniRef50.ipynb
Use the model in half-precision; simply cast the model to fp16 via model=model.half()

These two changes allowed us to embed proteins up to a few thousand residues using the encoder of ProtT5 in half-precision on a GPU with 8GB vRAM.

If you have a GPU with less than 8GB vRAM, you could also check our new colab-notebook which allows you to generate embeddings on google's colab: https://colab.research.google.com/drive/1TUj-ayG3WO52n5N50S7KH9vtt6zRkdmj?usp=sharing

By the way: we usually even embed proteins longer than 1024 (so you could remove slicing off residues after 1024). ProtT5 has a learnt positional encoding so you should even get meaningful embeddings for proteins longer than 1024 (though, this is something you might want to confirm for your problem).

CNwangbin commented 2 years ago

If you are already using batch_size=1, then you can try the following:

Depending on what you actually want to do, you do not need to run the whole ProtT5 model. ProtT5 consists of an Encoder and a Decoder. However, we've realized (and this aligns with results from NLP if I'm not mistaken) that if you only want to generate embeddings for downstream prediction tasks, running only the Encoder is sufficient. The Decoder is usually only needed for generation tasks. We use only the Encoder-Part of ProtT5 throughout all our experiments. This is conveniently handled by simply using the corresponding Huggingface Interface (T5EncoderModel) which strips off the Decoder for you. See an example here: https://github.com/agemagician/ProtTrans/blob/master/Embedding/PyTorch/Advanced/ProtT5-XL-UniRef50.ipynb

Use the model in half-precision; simply cast the model to fp16 via model=model.half()

These two changes allowed us to embed proteins up to a few thousand residues using the encoder of ProtT5 in half-precision on a GPU with 8GB vRAM.

If you have a GPU with less than 8GB vRAM, you could also check our new colab-notebook which allows you to generate embeddings on google's colab: https://colab.research.google.com/drive/1TUj-ayG3WO52n5N50S7KH9vtt6zRkdmj?usp=sharing

By the way: we usually even embed proteins longer than 1024 (so you could remove slicing off residues after 1024). ProtT5 has a learnt positional encoding so you should even get meaningful embeddings for proteins longer than 1024 (though, this is something you might want to confirm for your problem).

Thanks,it actually helped me a lot.

agemagician / ProtTrans

Cuda out of memory #77