galsang / BiDAF-pytorch

Re-implementation of BiDAF(Bidirectional Attention Flow for Machine Comprehension, Minjoon Seo et al., ICLR 2017) on PyTorch.
244 stars 85 forks source link

In att_flow_layer of bidaf model #25

Open FayeXXX opened 4 years ago

FayeXXX commented 4 years ago

Hi,I have just started to learn QA models and thank u sooo much 4 sharing this. I found that the attention u write is a little bit different from the origin paper: on line 141 of model.py s = self.att_weight_c(c).expand(-1, -1, q_len) + \ self.att_weight_q(q).permute(0, 2, 1).expand(-1, c_len, -1) + \ cq However, the paper use [h; u; h ◦u], that is 6d after concatenation, which is different from ur multiplication above. Does it make a difference?

kushalj001 commented 4 years ago

This implementation has broken down the learnable parameters. Instead of using a 6d trainable weight after concatenating the 3 tensors [h; u; h ◦u], he has used 3 different trainable weights of dim 2d. I am not sure how much of a difference does this makes. I am reimplementing this paper currently using the 6d approach.