initial the hidden layer for decoder input

in you paper, you mention that using the encoder output and built the S with them will bring the semantic information into the decoder part , but in the paper, the S seems to define as linear layer+relu layer+linear layer while in the code there is only one linear layer. dose it changed by some reason or i have wrong understanding about them?

Pay20Y / SEED

initial the hidden layer for decoder input #31