Closed Mao-KU closed 2 years ago
The src tokens are truely fed twice. This is for the purpose of the compilation for original fairseq code. Actually the same computation is calculated twice in Eager mode. But in Graph mode the redundant part will be automatically optimized.
Hi,
In the criterion script of constructing the contrastive loss, one line is: https://github.com/PANXiao1994/mRASP2/blob/36c17003dcd642affbe8290c8f26231fec77794a/mcolt/criterions/label_smoothed_cross_entropy_with_contrastive.py#L50
for this, [src tokens -> encoder] -> decoder -> output
another line: https://github.com/PANXiao1994/mRASP2/blob/36c17003dcd642affbe8290c8f26231fec77794a/mcolt/criterions/label_smoothed_cross_entropy_with_contrastive.py#L52
for this, src tokens -> encoder -> encoder output, which is the same as the part in [ ] above
It seems you feed the src tokens 2 times into the encoder, although the loss computation will be right, will this decrease the training efficiency?
Or anything I missed? Thank you in advance.