I want to skip some epochs

facebookresearch / fairseq-lua

Facebook AI Research Sequence-to-Sequence Toolkit

Other

3.74k stars 616 forks source link

I want to skip some epochs #118

Closed travel-go closed 7 years ago

travel-go commented 7 years ago

Hello,I am a newbie. When I was training the model, I accidentally closed the process. How could I skip the previous training model?

travel-go commented 7 years ago

I ended up at model_epoch13.th7 and I want to continue my model training

jgehring commented 7 years ago

I assume the aborted process created some checkpoints? If you re-start the training with the same -savedir (and command-line arguments), it will automatically resume from the last checkpoint.

travel-go commented 7 years ago

CUDA_VISIBLE_DEVICES=0,3 fairseq train -sourcelang en -targetlang de -datadir data-bin/news_bpe_2014 -model fconv -nenclayer 15 -nlayer 15 -fconv_nhids 512,512,512,512,512,512,512,512,512,512,768,768,768,2048,2048 -fconv_nlmhids 512,512,512,512,512,512,512,512,512,512,768,768,768,2048,2048 -fconv_kwidths 3,3,3,3,3,3,3,3,3,3,3,3,3,1,1 -fconv_klmwidths 3,3,3,3,3,3,3,3,3,3,3,3,3,1,1 -dropout 0.2 -optim nag -lr 0.25 -clip 0.1 -momentum 0.99 -timeavg -bptt 0 -savedir trainings/final -validbleu -batchsize 48 -maxbatch 1200 & This is my training command.Should I re-use this order, or do I need to add other parameters?

jgehring commented 7 years ago

It should be fine to re-start it exactly like this. After startup, the program should print something like Found existing state, attempting to resume training immediately.

travel-go commented 7 years ago

model_best_opt.th7 model_epoch12.th7 model_epoch3.th7 model_epoch7.th7 state_epoch11.th7 state_epoch2.th7 state_epoch6.th7 state_last.th7 model_best.th7 model_epoch13.th7 model_epoch4.th7 model_epoch8.th7 state_epoch12.th7 state_epoch3.th7 state_epoch7.th7 model_epoch10.th7 model_epoch1.th7 model_epoch5.th7 model_epoch9.th7 state_epoch13.th7 state_epoch4.th7 state_epoch8.th7 model_epoch11.th7 model_epoch2.th7 model_epoch6.th7 state_epoch10.th7 state_epoch1.th7 state_epoch5.th7 state_epoch9.th7 This is my saved training model.Thank you very much

travel-go commented 7 years ago

Wow, that's great! Thank for your replay.

jgehring commented 7 years ago

FYI, the crucial one for resuming is state_last.th7.