facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
29.86k stars 6.32k forks source link

Potential bug in `cross_attentive_regularization` for joint-s2t #4845

Open johntsi opened 1 year ago

johntsi commented 1 year ago

🐛 Bug

I think there is a bug in how the Cross-Attention Regularization (CAR) is applied in the joint speech-to-text task.

According to the equation (3) in the paper "Improving Speech Translation by Understanding and Learning from the Auxiliary Text Translation Task", the teacher states in the Cross-Attention Regularization (CAR) are coming from the text representations.

image

But it seems that in the code for the s2t_dualinputtransformer.py, the default is the opposite, meaning that the teacher states are coming from the speech representations.

image

The correct behavior (teacher=text) can be used with setting the argument cross_attentive_loss_reverse, but this is always set to False.

image

image

To induce the behavior according the paper, swapping lines 529-534 with 536-541 should work.

To Reproduce

I did not try to reproduce any error, I think both ways of applying CAR would work but the one described in the paper is not the default.

johntsi commented 1 year ago

@yuntang Hey Yun, could you have a quick look at it?