ESA vectors computation based on queries

In order to compute ESA representations, an index is queried to compute index scores. This causes problems because the boolean-generated query has a limitation of 1024 tokens, which we frequently reach with Wikipedia articles.

Beside that, the similarities are not well computed because of the boolean vs. weighted representation. We have to change to Apache's MoreLikeThis way of computing similarities between two documents

cristinae / WikiTailor

ESA vectors computation based on queries #17