Closed jzhoubu closed 5 years ago
I think your learning rate is too high, try 5e-4
CUDA_VISIBLE_DEVICES=0 fairseq-train \ data-bin/iwslt14.tokenized.de-en \ --arch transformer_iwslt_de_en --share-decoder-input-output-embed \ --optimizer adam --adam-betas '(0.9, 0.98)' --clip-norm 0.0 \ --lr 5e-4 --lr-scheduler inverse_sqrt --warmup-updates 4000 \ --dropout 0.3 --weight-decay 0.0001 \ --criterion label_smoothed_cross_entropy --label-smoothing 0.1 \ --max-tokens 4096
Problem solved. I re-trained the model with lr
=5e-4, reaching BLEU4=9.04 at epoch 5. Yet I don't see any weird results.
Hi,
I was trying to train a English-to-German model on iwslt dataset. I trained the model for about 80 epochs(I show the parameters at last) but during inference I obtain weird results as below:
I preprocess the data and train the model using the code below:
Everything goes fine when I train the de2en model, i.e.
SRC_LANG=de
andTGT_LANG=en
. However, when I train this en2de model, I got the same weird result for every sentence as I mentioned above. I am wondering if I have made any mistake on the parameter setting for training a reversed language model on the original dataset.