Stuff like ли and по gets matched in the entities in the corpus, but doesn't match in the query thus the corresponding ngrams don't get formed.
I've added space-separated stopwords loading and removing them during feature convert phase, thus entities won't be affected. @barzerman please take a look and give me a go-ahead :)
Example query: http://eu.barzer.net/query/json?key=aRLsIvszISAReCoS6ktgviZxN0YlRpbs6DKH7vro&zurch=yes&flag=d&query=%D0%BC%D0%BE%D0%B6%D0%BD%D0%BE%20%D0%BB%D0%B8%20%D0%BF%D0%BB%D0%B0%D1%82%D0%B8%D1%82%D1%8C%20%D0%B7%D0%B0%20%D0%BF%D0%BE%D0%BA%D1%83%D0%BF%D0%BA%D0%B8%20%D0%BA%D0%B0%D1%82%D1%80%D0%BE%D0%B9%20%D0%B2%D0%B0%D1%88%D0%B5%D0%B3%D0%BE%20%D0%B1%D0%B0%D0%BD%D0%BA%D0%B0?
The proper document is 1.5, which is quite low.
Stuff like
ли
andпо
gets matched in the entities in the corpus, but doesn't match in the query thus the corresponding ngrams don't get formed.I've added space-separated stopwords loading and removing them during feature convert phase, thus entities won't be affected. @barzerman please take a look and give me a go-ahead :)