Save and overwrite best checkpoint every epoch

facebookresearch / audiocraft

Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.

MIT License

20.5k stars 2.06k forks source link

Save and overwrite best checkpoint every epoch #267

Closed cvillela closed 11 months ago

cvillela commented 11 months ago

Hello,

Congratulations on the model! I have been experimenting with model training and finetuning ever since the code got released. One feature that could be immensely useful is the writing/overwriting of the best state of the current training at every validation epoch by a given metric (cross-entropy as default). From what I understand, the best state is recognized and saved only at the end of the training, which can be tricky for models that eventually overfit or need to have the training abruptly stopped.

If there is a config parameter for this already, could anyone please point it to me?

Best, Caio

adefossez commented 11 months ago

This is already happening. All the evals happen with the best checkpoint. By default Cross entropy on the valid set is used to select the best checkpoint: https://github.com/facebookresearch/audiocraft/blob/main/audiocraft/solvers/musicgen.py#L46

adefossez commented 11 months ago

The best state is stored in the latest checkpoint of rank 0 (the one without any integer suffix after the .th), under the key best_state or fsdp_best_state (if trained with fsdp). This is the model that is exported when following those isntructions: https://github.com/facebookresearch/audiocraft/blob/main/docs/MUSICGEN.md#importing--exporting-models

cvillela commented 11 months ago

@adefossez Thank you for the response. I was under the impression that the checkpoint of rank 0 was always saving the last state, and was not aware of the best_state key.

I am marking this issue as closed!