Closed LSimon95 closed 10 months ago
@LSimon95 Do you know the value of n_heads in MHA in MRTE module used by Bytedance? I read the paper, but did not find related declare. I see you use 2?
@Liujingxiu23 No. I can't find the exact value. Same as the content encoder for convenience.
Implementation is different with paper in some modules for losing details in the early paper. And I will modify the code following the new information and train on a larger dataset. Some differences are shown below.