NVIDIA / OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
https://nvidia.github.io/OpenSeq2Seq
Apache License 2.0
1.54k stars 371 forks source link

only best or last model saving #418

Open remotejob opened 5 years ago

remotejob commented 5 years ago

Is it possible save only best or last model during training

borisgin commented 5 years ago

Do you mean to compute the validation loss for K last epochs and save only the best of them? Or save the checkpoints for all last K epochs?

remotejob commented 5 years ago

As I understand

"save_checkpoint_steps": 20001,

will save model only after 20001 steps?

But will be nice to have opportunity save only best model and discard(delete) previous model.

I am try play with your protect using https://www.kaggle.com/

but kaggle have disk space limitation 5.2GB.

Other solution for example OpenNMT-py have option -keep_checkpoint 1 so it keep only last (1) checkpoint

borisgin commented 5 years ago

To address the memory problem you can use this parameter: 'num_checkpoints': int, # maximum number of last checkpoints to keep We also support another useful flag, which takes the best saved checkpoint: 'restore_best_checkpoint': bool, # if True, restore best check point instead of latest checkpoint

vsl9 commented 5 years ago

Currently, OpenSeq2Seq stores N last checkpoints (with specified frequency "save_checkpoint_steps", including the last step checkpoint) and N best checkpoints (in best_models directory). N can be defined in a config file. For example, if you add "num_checkpoints": 1, it will keep two checkpoints only: the last one and the best one.