Open schwittlick opened 7 years ago
This module does pretty well text similarity analysis on text blocks(documents): https://github.com/chrisjmccormick/simsearch
Seems super valuable for training models on the txt data and finding similar sentences.
One interesting approach could be to generate hundreds of thousands of lines via an RNN with different seeds for different purposes. For example:
These pre-generated lines will be the Corpus to find the most similar answer to the question typed into ECO. The similarity search via simsearch should be able to find something quickly that is talking about a similar topic. Interesting could be to use 50% of the time the database of the RNN generated answers and 50% of the time use the original sentences parsed from the PDFs and find the most similar sentence in these.
compare:
here are a few links on doc2vec stuff: #135