in you paper, you mention that using the encoder output and built the S with them will bring the semantic information into the decoder part , but in the paper, the S seems to define as linear layer+relu layer+linear layer while in the code there is only one linear layer. dose it changed by some reason or i have wrong understanding about them?
in you paper, you mention that using the encoder output and built the S with them will bring the semantic information into the decoder part , but in the paper, the S seems to define as linear layer+relu layer+linear layer while in the code there is only one linear layer. dose it changed by some reason or i have wrong understanding about them?