Open FayeXXX opened 4 years ago
This implementation has broken down the learnable parameters. Instead of using a 6d trainable weight after concatenating the 3 tensors [h; u; h ◦u], he has used 3 different trainable weights of dim 2d. I am not sure how much of a difference does this makes. I am reimplementing this paper currently using the 6d approach.
Hi,I have just started to learn QA models and thank u sooo much 4 sharing this. I found that the attention u write is a little bit different from the origin paper: on line 141 of model.py s = self.att_weight_c(c).expand(-1, -1, q_len) + \ self.att_weight_q(q).permute(0, 2, 1).expand(-1, c_len, -1) + \ cq However, the paper use [h; u; h ◦u], that is 6d after concatenation, which is different from ur multiplication above. Does it make a difference?