Hey, I think the implementation is different from the original paper. According to the paper, the output attention is computed using the output of LSTM at current time step and the attributes. But your implementation here is computed using the output of LSTM at previous time step and the attributes.
Hey, I think the implementation is different from the original paper. According to the paper, the output attention is computed using the output of LSTM at current time step and the attributes. But your implementation here is computed using the output of LSTM at previous time step and the attributes.