-adding spacy tokenizer (however, it's much slower than the anserini)
-using gensim for downloading/using word embedding
-reading stats from anserini indexed corpus
It definitely needs further cleaning.
However, at the moment, I want to commit this to the feature/PES20 branch from my fork. Since I am adding new features (entitylinking) and it might get too much to review altogether.
-adding spacy tokenizer (however, it's much slower than the anserini) -using gensim for downloading/using word embedding -reading stats from anserini indexed corpus
It definitely needs further cleaning. However, at the moment, I want to commit this to the feature/PES20 branch from my fork. Since I am adding new features (entitylinking) and it might get too much to review altogether.