Open zouharvi opened 11 months ago
Hi zouharvi,
I noticed this behavior as well. I think it has something to do with "Encoder model fine-tuning". After this the speed gradually decreases for me from 13.98it/s to 5.85it/s at the end of the epoch.
Could someone comment if this is an excepted behavior?
Indeed, without encoder fine-tuning (nr_frozen_epochs=1
), this does not happen. Shot in the dark: I wonder if there is some memory leak associated with that which leaves some grad-able objects on the GPU?
hmmm and what happens on the second epoch? I actually never noticed this...
In the second and the next epochs it converges to ~5it/s for me (A10G with batch size 6).
Hi, I trained two reference-free QE models on in-domain data with 300k segments. One with nr_frozen_epochs=0.3
(as proposed in the config in this repo), and the other with nr_frozen_epochs=1
. The rest of the parameters stayed the same.
The True Positive Rate of the prediction is lower by about 10% when using nr_frozen_epochs=1
. So the model where the encoder-fine tuning takes place later, leads to worse performance.
The training was indeed faster until the first epoch, after this the "Encoder model fine-tuning" took place (as intended).
I noticed that
comet-train
(after encoder finetuning) has speed of ~12it/s at e.g. 30% which drops to ~7it/s at 60% and to ~6it/s at 90% of the epoch.I'm using NVIDIA A10G GPUs and the following software versions: