Closed Ca-ressemble-a-du-fake closed 1 year ago
The weight averaging takes the average weights of the last n checkpoints. If you delete all models but the last one and re-run the averaging, the resulting model will be the average of just one model, i.e. just the model itself. The impact on the quality is probably negligible for models where one epoch is just a couple of steps. The impact will be greater for models where the checkpoints are many thousands of steps apart. SO I believe for your use cases this does not matter.
That's clear thank you!
Hi,
In order to save disk space, I only keep the last checkpoint of a model ("Meta_A/checkpoint_XYZ.pt") after running the weight averaging. But then later I train a new model and run weight averaging again (so it shows that it runs on the first model "A" with a single checkpoint).
Will it change the best.pt of the first model "A" ? Will it matter if I use it as a pretrained checkpoint to finetune a new model on it ?
Thanks in advance for your explanation!