Open xiaolongwu0713 opened 3 years ago
Hi, I notice you calculate the alignment score by bmm a nn.parameter with tanh(encoder_output, decoder_hidden_state). However in the original paper, there is no need to bmm this extra nn.parameter. It says:
so is there any reason for the multiplying?
Hi, I notice you calculate the alignment score by bmm a nn.parameter with tanh(encoder_output, decoder_hidden_state). However in the original paper, there is no need to bmm this extra nn.parameter. It says:
so is there any reason for the multiplying?