allenai / deep_qa

A deep NLP library, based on Keras / tf, focused on question answering (but useful for other NLP too)
Apache License 2.0
404 stars 133 forks source link

Add Gated Attention Reader (pt. II) #187

Closed nelson-liu closed 7 years ago

nelson-liu commented 7 years ago

This PR won't work until the others are all merged, but this PR adds the gated attention reader model.

todo:

nelson-liu commented 7 years ago

Also, I'm pretty concerned about how masking is treated in lines 120 through 126 - I think it's wrong. Can you just use MatrixAttention instead of the MaskedBatchDot and then MaskedSoftmax?

I tried using MatrixAttention before, but ran into memory issues from tiling the matrix (hence why i swapped to maskedbatchdot and maskedsoftmax)

matt-gardner commented 7 years ago

The trouble with how you did it, though, is that MaskedBatchDot returns None for a mask, opting instead for zeros in the similarity matrix. Then you pass those zeros through MaskedSoftmax with no mask. It's not doing what you think it's doing. Instead, I'd recommend a BatchDot layer that handles the mask properly, and then you can pass the result into MaskedSoftmax to get the right behavior.

nelson-liu commented 7 years ago

i see, yeah that makes sense. instead of creating a batchdot layer, could i just add a mask to the maskedbatchdot layer?

matt-gardner commented 7 years ago

If you're returning a mask, there's no reason to use masked_batch_dot; you can just use K.batch_dot.

nelson-liu commented 7 years ago

Just one high-level comment: you can just specify the default key in the seq2seq_encoder params, and give the right fallback behavior when you get the seq2seq encoder on lines 109 and 115. This would make it so you don't have to repeat yourself so much in the config file.

I was trying to favor being explicit over implicit, but yeah i agree.

If you're returning a mask, there's no reason to use masked_batch_dot; you can just use K.batch_dot.

that makes sense, I'll make a BatchDot layer that properly passes on a mask.

nelson-liu commented 7 years ago

though im not sure how i feel about the naming --- MaskedBatchDot doesn't pass forward a mask, while BatchDot does.

matt-gardner commented 7 years ago

Honestly, we probably don't need the MaskedBatchDot layer or function, because we should just be propagating the mask correctly. It's really unlikely that you'd use the output of a batch dot without some other mask-using operating first (like a softmax). We can leave them there, but it's probably a good idea to leave notes that they probably aren't necessary.