allenai / deep_qa

A deep NLP library, based on Keras / tf, focused on question answering (but useful for other NLP too)
Apache License 2.0
404 stars 132 forks source link

Retrieval framework #284

Closed matt-gardner closed 7 years ago

matt-gardner commented 7 years ago

Sorry, took a bit longer than I was hoping, but this now works. It should be pretty easy now to plug in Nelson's model to try this out. So far I just moved Pradeep's code around so that it was easy to extend. I also made the approximate nearest neighbor algorithm configurable, because Chandra told me about an implementation that he thought worked better than scikit-learn's LSH.

nelson-liu commented 7 years ago

looks like there was a flaky theano test, so i reran that one (TestThresholdTupleMatcher.test_general_case, might want to add a flaky flag to it).

Do you want me to look at this? I recall you said you made some edits to the sentence selection model; were you going to add those changes here or in a separate PR?

matt-gardner commented 7 years ago

The change to that model will be in a separate PR; this PR is good to look at now.

pdasigi commented 7 years ago

Sorry I did not notice this earlier. I get too much email from Github. It will be easier for me to notice things "assigned" to me because they show up on the PR list page.

matt-gardner commented 7 years ago

Hmm, that's a good question about the IDF stuff. I was working from bow_lsh.py, moving stuff around and largely using the methods that I saw there. I must have started from the version before your IDF PR got merged... Oops. I'll add a TODO here to put that back in, but I didn't remove bow_lsh.py, so we can use that for the IDF experiments until it's merged in to this. It shouldn't be hard, but we might need to add some kind of fit method to the retrieval encoders to allow them to set some parameters from the corpus.