apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.68k stars 1.04k forks source link

Switch KNN classifier to use BM25 similarity [LUCENE-7776] #8827

Closed asfimport closed 7 years ago

asfimport commented 7 years ago

It'd be good to use BM25 as default Similarity for KNN classifier. Having done some tests on the 20newsgroups dataset that resulted in improved f1 (between 0.10 and 0.15).


Migrated from LUCENE-7776 by Tommaso Teofili (@tteofili), resolved Apr 11 2017

asfimport commented 7 years ago

ASF subversion and git services (migrated from JIRA)

Commit 0f60c4233ca2ae4bf3bd5a6cc395766e84119cd9 in lucene-solr's branch refs/heads/master from @tteofili https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0f60c42

LUCENE-7776 - use bm25 for knn classifier

asfimport commented 7 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1bf36962285110c4ac2d1f468de3cc7fde379c0e in lucene-solr's branch refs/heads/master from @cpoerschke https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1bf3696

LUCENE-7776: remove unused import

asfimport commented 7 years ago

Adrien Grand (@jpountz) (migrated from JIRA)

+1!

asfimport commented 7 years ago

ASF subversion and git services (migrated from JIRA)

Commit 9c00fc6795228f8938fe1601697835b5acdd8290 in lucene-solr's branch refs/heads/master from @tteofili https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9c00fc6

LUCENE-7776 - visualize diff btwn BytesRef values in ClassificationTestBase

asfimport commented 7 years ago

ASF subversion and git services (migrated from JIRA)

Commit 5c5254341e4158c24f3fc6ef3a54f6da6f667120 in lucene-solr's branch refs/heads/master from @cpoerschke https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5c52543

LUCENE-7776: change javadocs default mention from Classic to BM25

(Also kinda added missing javadoc for new method to fix 'ant precommit'.)

asfimport commented 7 years ago

ASF subversion and git services (migrated from JIRA)

Commit 7fde878ae4780f1837189e4e8c531b373bc87c07 in lucene-solr's branch refs/heads/master from @tteofili https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=7fde878

LUCENE-7776 - adjusted failing tests in solr due to switching to bm25 in knn

asfimport commented 7 years ago

Alessandro Benedetti (@alessandrobenedetti) (migrated from JIRA)

Good one Tommaso! I have been working recently on this :

8549

The modification itself is not big but part of the task has been a consistent refactor and introduction of testing for the more like this component ( which is heavily used by the Knn classifiers) . I understand the patch will be quite big ( and probably boring to review) but if we finalize it, it will open the possibility of an easy extension and improvement for the more like this.

I will update the Jira issue with a Pull Request and the details related what is in there and the benefits in the next days, feel free to review it (

asfimport commented 7 years ago

Tommaso Teofili (@tteofili) (migrated from JIRA)

sure Alessandro, thanks for sharing info about your work, I'll have a look once you open the PR.