Hi, Thanks for your great job.
I wonder that the attention mechanism of your code seems to be changed.
The shape of attention vector should be (batch, timestep, timestep), but according to your code, the shape of self attention vector is (batch, timestep, hidden_size). There is new code that I fixed below. Please review it and appreciate your comments. Thank you.
`
class Attention(nn.Module):
def init(self, num_hidden, h=8):
super(Attention, self).init()
Hi, Thanks for your great job. I wonder that the attention mechanism of your code seems to be changed. The shape of attention vector should be (batch, timestep, timestep), but according to your code, the shape of self attention vector is (batch, timestep, hidden_size). There is new code that I fixed below. Please review it and appreciate your comments. Thank you.
` class Attention(nn.Module): def init(self, num_hidden, h=8): super(Attention, self).init()
`