[QUESTION] Why does training speed go down?

Unbabel / COMET

A Neural Framework for MT Evaluation

https://unbabel.github.io/COMET/html/index.html

Apache License 2.0

453 stars 72 forks source link

[QUESTION] Why does training speed go down? #158

Open zouharvi opened 11 months ago

zouharvi commented 11 months ago

I noticed that comet-train (after encoder finetuning) has speed of ~12it/s at e.g. 30% which drops to ~7it/s at 60% and to ~6it/s at 90% of the epoch.

Is that something particular to only me or did anyone else observe this as well?
If yes, is this expected behaviour?

I'm using NVIDIA A10G GPUs and the following software versions:

Python - 3.10.9
COMET - upstream
torch - 2.0.1
pytorch-lightning - 1.9.5
transformers - 4.29.0
numpy - 1.24.3

maxiek0071 commented 11 months ago

Hi zouharvi,

I noticed this behavior as well. I think it has something to do with "Encoder model fine-tuning". After this the speed gradually decreases for me from 13.98it/s to 5.85it/s at the end of the epoch.

Could someone comment if this is an excepted behavior?

zouharvi commented 11 months ago

Indeed, without encoder fine-tuning (nr_frozen_epochs=1), this does not happen. Shot in the dark: I wonder if there is some memory leak associated with that which leaves some grad-able objects on the GPU?

ricardorei commented 11 months ago

hmmm and what happens on the second epoch? I actually never noticed this...

zouharvi commented 11 months ago

In the second and the next epochs it converges to ~5it/s for me (A10G with batch size 6).

maxiek0071 commented 11 months ago

Hi, I trained two reference-free QE models on in-domain data with 300k segments. One with nr_frozen_epochs=0.3 (as proposed in the config in this repo), and the other with nr_frozen_epochs=1. The rest of the parameters stayed the same. The True Positive Rate of the prediction is lower by about 10% when using nr_frozen_epochs=1. So the model where the encoder-fine tuning takes place later, leads to worse performance. The training was indeed faster until the first epoch, after this the "Encoder model fine-tuning" took place (as intended).