Closed li-pengcheng closed 5 years ago
I hope this slide would help your understanding. Sorry for the late comment. For the formula (2), since we made a summation of alpha x f, not the alpha only, the F_att does not become 1.
So the fi in formula 2 can be understood as F, right?
Yes, you are right.
@Ugness Thanks for your work, I have some question about the Figure 2 (b) and formula 2 of the original paper, could you tell me how do you understand and implement the reshape operation in figure 2 (b)?In my understanding, according to formula (2), if the sum of attention weight after softmax is calculated, wouldn't the attention weight of each position become 1?