Basic question: when does translation training end?

echan00 commented 4 years ago

Sorry for the basic question, but when does translation training end?

These are my parameters:

CUDA_VISIBLE_DEVICES=0 python3 train.py \
   data-bin/en_zh \
   --arch transformer_iwslt_de_en --share-decoder-input-output-embed \
   --optimizer adam --adam-betas ‘(0.9, 0.98)’ --clip-norm 0.0 \
   --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000 \
   --dropout 0.3 --weight-decay 0.0001 \
   --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \
   --max-tokens 4096 \
   --fp16

echan00 commented 4 years ago

Until learning rate is too small :)

pamin2222 commented 4 years ago

I think you can look at the validation loss. Stop the training manually when the validation loss stops decreasing.

You can also use '--max-epoch' or '--max-update ' to force stop training at a specified epoch/update. https://fairseq.readthedocs.io/en/latest/command_line_tools.html#fairseq-train

echan00 commented 4 years ago

Just to be sure, which is validation loss?

 epoch 031:  53%|▌| 28980/55116 [2:01:41<1:45:30,  4.13it/s, loss=3.662, nll_loss=1.945, ppl=3.85, wps=13520, ups=4, wpb=3406.657, bsz=107.062, num_updates=1.68238e+06, lr=2.43802e-05, gnorm=3.772, clip=0.000, oom=0.000, loss_scale=2.000, wall=441405, train_wall=416601]
| epoch 031:  53%|▌| 28981/55116 [2:01:42<1:44:39,  4.16it/s, loss=3.662, nll_loss=1.945, ppl=3.85, wps=13520, ups=4, wpb=3406.643, bsz=107.062, num_updates=1.68238e+06, lr=2.43802e-05, gnorm=3.772, clip=0.000, oom=0.000, loss_scale=2.000, wall=441405, train_wall=416601]
| epoch 031:  53%|▌| 29398/55116 [2:03:27<1:55:04,  3.73it/s, loss=3.663, nll_loss=1.946, ppl=3.85, wps=13520, ups=4, wpb=3406.636, bsz=107.059, num_updates=1.6828e+06, lr=2.43772e-05, gnorm=3.772, clip=0.000, oom=0.000, loss_scale=2.000, wall=441510, train_wall=416704]

facebookresearch / fairseq

Basic question: when does translation training end? #1348