facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.16k stars 6.37k forks source link

Can't replicate Simultaneous Machine Translation results [missing parameters & bugs] #4001

Open ereday opened 2 years ago

ereday commented 2 years ago

🐛 Bug

@xutaima Following other active/closed issues related to this model, I understand that I should contact you regarding this issue. I am unable to train Multihead Monotonic Attention models (Tried both IL and Hard). There are two main issues here: First, the training command given in the official readme file seems wrong. It does not match with the paper. Searching on the old active/closed issues, I ended up using the following command for training the MMA-Hard variant:

       fairseq-train ~/wmt15_de_en/data_bin \
                    --simul-type hard_aligned --mass-preservation \
                    --criterion latency_augmented_label_smoothed_cross_entropy \
                    --latency-var-weight 0.1 --max-update 50000 \
                    --arch transformer_monotonic_vaswani_wmt_en_de_big \
                    --optimizer adam --adam-betas '(0.9, 0.98)' \
                    --lr-scheduler 'inverse_sqrt' --warmup-init-lr 1e-7  \
                    --warmup-updates 4000 --lr 5e-4 --stop-min-lr 1e-9 \
                    --clip-norm 0.0 --weight-decay 0.0001 --dropout 0.3 \
                    --label-smoothing 0.1 --max-tokens 3584 \
                    --left-pad-source --update-freq 8 --log-format json \
                    --log-file ./wmt15_de-en_mmahard.json --max-source-positions 100 \
                    --max-target-positions 100 \
                    --skip-invalid-size-inputs-valid-test \
                    --restore-file checkpoints/checkpoint_last.pt

Note that there are several important differences between the command above and readme:

So, my first question is: Could you share the complete & correct training command, please?

More importantly, I'd like to report a very critical bug on the label_smoothed_cross_entropy_latency_augmented loss function. In the current implementation [LINK], both weighted_avera_latency and head divergence loss are weighted using the same coefficient latency_avg_weight which is set to 0.0 for MMA-hard model. This means, no latency regularisation term is used in model training.

I trained two models using the above command: before and after fixing the bug in the loss function on my local.

The model trained with buggy loss function ended up with an acceptable bleu score for WMT15 de-en (still 1.5 lower than what you reported in the paper for some reason, but it is OK for now). However, because it was trained with no latency regularization term, model has terrible latency scores:

BLEU AL. AP DAL
26.2 21.69 1 21.69

The model trained after bug-fix* has much better latency scores. However, it made the bleu score drop alot:

lambda_var BLEU AL DAL
0.01 25.4 20.38 21.56
0.02 23.8 16.77 21.15
0.05 18.9 20.55 21.02
0.1 14.1 2.9 7.06
0.4 18.5 2.32 5.66

As you can see, I tried many different various lambda parameters and none of them worked well.

According to the Table-6 in your paper, when I set lambda to 0.1, I should be able to get BLEU scores of 28.5 and DAL scores of 10.83. My second question is: Could you please tell me what do I need to get similar results?

bug-fix: The only thing I did for this was to replace `var_loss = self.latency_avg_weight expected_delays_varwithvar_loss = self.latency_var_weight * expected_delays_var`

ereday commented 2 years ago

I would appreciate it greatly if you can let me know what needs to be done in order to perform a proper training, @xutaima . Thanks in advance.

ereday commented 2 years ago

Looking at old issues related to SiMT, I see that you (@jmp84 ) can also help me on this issue. I would appreciate it greatly if you can let me know what needs to be done in order to perform a proper training.

sathishreddy commented 2 years ago

@ereday ... I am trying to reproduce the results for "infinite_lookback" attention. Did you have any results with this? My BLEU scores are very low. I have fixed above bug you have mentioned, but still no improvements. Thanks in advance.

EricLina commented 2 years ago

@ereday ... I am trying to reproduce the results for "infinite_lookback" attention. Did you have any results with this? My BLEU scores are very low. I have fixed above bug you have mentioned, but still no improvements. Thanks in advance.

Hello @sathishreddy , Have you found the solution ?