Closed matt-gardner closed 7 years ago
looks like there was a flaky theano test, so i reran that one (TestThresholdTupleMatcher.test_general_case
, might want to add a flaky flag to it).
Do you want me to look at this? I recall you said you made some edits to the sentence selection model; were you going to add those changes here or in a separate PR?
The change to that model will be in a separate PR; this PR is good to look at now.
Sorry I did not notice this earlier. I get too much email from Github. It will be easier for me to notice things "assigned" to me because they show up on the PR list page.
Hmm, that's a good question about the IDF stuff. I was working from bow_lsh.py
, moving stuff around and largely using the methods that I saw there. I must have started from the version before your IDF PR got merged... Oops. I'll add a TODO here to put that back in, but I didn't remove bow_lsh.py
, so we can use that for the IDF experiments until it's merged in to this. It shouldn't be hard, but we might need to add some kind of fit
method to the retrieval encoders to allow them to set some parameters from the corpus.
Sorry, took a bit longer than I was hoping, but this now works. It should be pretty easy now to plug in Nelson's model to try this out. So far I just moved Pradeep's code around so that it was easy to extend. I also made the approximate nearest neighbor algorithm configurable, because Chandra told me about an implementation that he thought worked better than scikit-learn's LSH.