capreolus-ir / capreolus

A toolkit for end-to-end neural ad hoc retrieval
https://capreolus.ai
Apache License 2.0
95 stars 32 forks source link

Temp feature/pes20 #43

Closed ghazalehnt closed 4 years ago

ghazalehnt commented 4 years ago

-adding spacy tokenizer (however, it's much slower than the anserini) -using gensim for downloading/using word embedding -reading stats from anserini indexed corpus

It definitely needs further cleaning. However, at the moment, I want to commit this to the feature/PES20 branch from my fork. Since I am adding new features (entitylinking) and it might get too much to review altogether.