PT5_LoRA_Finetuning_per_prot.ipynb - memory accumulation during validation

agemagician / ProtTrans

ProtTrans is providing state of the art pretrained language models for proteins. ProtTrans was trained on thousands of GPUs from Summit and hundreds of Google TPUs using Transformers Models.

Academic Free License v3.0

1.05k stars 150 forks source link

Hi all,

I am currently experimenting with your provided code. Your plot indicating memory usage for the different batch sizes & max_length seems to fit perfectly for our setup for training. However, when monitoring the memory usage two things are noticeable:

Memory seems to not be freed after training
Memory seems to accumulate during validation.

I could not find a solution for 1.

For 2. it seems to work, to set eval_accumulation_steps, which is transferring the model outputs to CPU.

Do you have an idea?

Keep up the great work.

Best wishes, Frederik

agemagician / ProtTrans

PT5_LoRA_Finetuning_per_prot.ipynb - memory accumulation during validation #153