RSchmirler / data-repo_plm-finetune-eval

Data repository for "Fine-tuning protein language models boosts predictions across diverse tasks"
https://www.nature.com/articles/s41467-024-51844-2
Creative Commons Attribution 4.0 International
18 stars 2 forks source link

Low GPU utilization when running finetune_pre_protein #2

Open LAJ-THU opened 1 month ago

LAJ-THU commented 1 month ago

Hi, I appreciate your effort to provide useful notebooks, but I have encountered one problem.

When I was running finetune_pre_protein.ipynb notebook, the GPU utilization is very low (around 10% - 20%), while running per-residue finetuning, GPU utilization is above 90%. (My GPU is RTX 3090).

I tried to increase the batch_size, reduce the gradient accum, and disabled deepspeed, but found little increase in GPU utilization, I wonder if there is any solution to this problem?

thx!

RSchmirler commented 1 month ago

Hi @LAJ-THU, sorry for my late response.

I did not encounter this, but have to admit I was not monitoring utilization closely.

I tried to increase the batch_size, reduce the gradient accum, and disabled deepspeed,

This is exactly what should increase it.

Which model are you using, is it occurring also for large models? How long are the sequences ? How are you measuring the utilization ? I just checked a per protein training run (ESM2 650M) and nvidia-smi looks fine image