Hi, sir, thanks for the impressive work. Mentioned in the paper, "To retain information regarding the order of input sequences being supplied, we add the positional encodings [23] to the input of each attention layer". However, the released code does not add the positional encodings to the Multi-Head Attention of the encoder, and only adds positional encodings to the Multi-Head Attention of the decoder. It's better if we don't apply positional encodings in the encoder?
Hi, sir, thanks for the impressive work. Mentioned in the paper, "To retain information regarding the order of input sequences being supplied, we add the positional encodings [23] to the input of each attention layer". However, the released code does not add the positional encodings to the Multi-Head Attention of the encoder, and only adds positional encodings to the Multi-Head Attention of the decoder. It's better if we don't apply positional encodings in the encoder?