How does the quality of the model change after running weight averaging on a single checkpoint ?

DigitalPhonetics / IMS-Toucan

Controllable and fast Text-to-Speech for over 7000 languages!

Apache License 2.0

1.4k stars 158 forks source link

How does the quality of the model change after running weight averaging on a single checkpoint ? #142

Closed Ca-ressemble-a-du-fake closed 1 year ago

Ca-ressemble-a-du-fake commented 1 year ago

Hi,

In order to save disk space, I only keep the last checkpoint of a model ("Meta_A/checkpoint_XYZ.pt") after running the weight averaging. But then later I train a new model and run weight averaging again (so it shows that it runs on the first model "A" with a single checkpoint).

Will it change the best.pt of the first model "A" ? Will it matter if I use it as a pretrained checkpoint to finetune a new model on it ?

Thanks in advance for your explanation!

Flux9665 commented 1 year ago

The weight averaging takes the average weights of the last n checkpoints. If you delete all models but the last one and re-run the averaging, the resulting model will be the average of just one model, i.e. just the model itself. The impact on the quality is probably negligible for models where one epoch is just a couple of steps. The impact will be greater for models where the checkpoints are many thousands of steps apart. SO I believe for your use cases this does not matter.

Ca-ressemble-a-du-fake commented 1 year ago

That's clear thank you!