apache / lucene

Apache Lucene open-source search software
https://lucene.apache.org/
Apache License 2.0
2.48k stars 985 forks source link

Remove IndexSearcher#search(Query,Collector) in favor of IndexSearcher#search(Query,CollectorManager) [LUCENE-10002] #11041

Open asfimport opened 3 years ago

asfimport commented 3 years ago

It's a bit trappy that you can create an IndexSearcher with an executor, but that it would always search on the caller thread when calling IndexSearcher#search(Query,Collector).

 Let's remove IndexSearcher#search(Query,Collector), point our users to IndexSearcher#search(Query,CollectorManager) instead, and change factory methods of our main collectors (e.g. TopScoreDocCollector#create) to return a CollectorManager instead of a Collector?


Migrated from LUCENE-10002 by Adrien Grand (@jpountz), updated Apr 06 2022 Pull requests: https://github.com/apache/lucene/pull/240

asfimport commented 3 years ago

Greg Miller (@gsmiller) (migrated from JIRA)

+1. I also don't love that users can setup an IndexSearcher with an Executor and then not leverage it since they're not providing a CollectorManager. We have the same mixed pattern in DrillSideways as well. If we end up moving forward with this change in IndexSearcher, I'll create another issue to make the same update in DrillSideways for consistency.

asfimport commented 2 years ago

Zach Chen (@zacharymorn) (migrated from JIRA)

Hi @jpountz @gsmiller, I have created a PR for this to deprecate the collector API in favor of the collector manager API, as well as some initial refactoring to some tests and the parts in DrillSideways that use TopScoreDocCollector & TopFieldCollector to use the latter API. I plan to submit more PRs afterward for other areas in the codebase.

Please note that I did first try to remove the collector API entirely, but that ended up resulting in way too many changes than I'm comfortable with in a single PR, and I also feel this API is such a commonly used one that users may prefer a more gradual deprecation / transition period. Hence I reverted my previous effort and adopted a phased approach.

asfimport commented 2 years ago

Greg Miller (@gsmiller) (migrated from JIRA)

Nice @zacharymorn! Quite a large change for sure! I took a look at the DrillSideways changes and they appear correct to me at first glance, but I'll see if I can spend more time going through the whole PR in the next couple of days.

In the meantime, I went ahead and spun off #11088 to track making a similar API change to DrillSideways.

asfimport commented 2 years ago

Zach Chen (@zacharymorn) (migrated from JIRA)

Nice @zacharymorn! Quite a large change for sure! I took a look at the DrillSideways changes and they appear correct to me at first glance, but I'll see if I can spend more time going through the whole PR in the next couple of days.

In the meantime, I went ahead and spun off #11088 to track making a similar API change to DrillSideways.

Sounds great, thanks Greg!

asfimport commented 2 years ago

Mark Robert Miller (@markrmiller) (migrated from JIRA)

I did this some time ago. Read in Mike's blog you can just toss in an executor, tossed it, looked for my that was easy button.

+1 on making it obvious that I literally had changed nothing. Unfortunately, would not have saved me from the complications around implementing the "map reduce" collectors, all hopefully efficiently, and working out when it makes sense to use them vs the std search method, given the overhead and the index size and number of segments needed to make the tradeoff worth it, or at least not a performance hit. Or working out how to deal with changing segment count makeup, what kind of merge control or shenanigans make sense when or how and when and why to manage the segment slices and how they are grouped. Etc. Etc.

Went from, "drop in an executor? sweet" to ...

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit 11006fba59d1d4cad28c7c4c917c9288f5206fc5 in lucene's branch refs/heads/main from Luca Cavanna https://gitbox.apache.org/repos/asf?p=lucene.git;h=11006fb

LUCENE-10002: Replace simple usages of TotalHitCountCollector with IndexSearcher#count (#612)

In case only number of documents are collected, IndexSearcher#search(Query, Collector) is commonly used, which does not use the executor that's been eventually set to the searcher. Calling IndexSearcher#count(Query) makes the code more concise and is also more correct as it honours the executor that's been set to the searcher instance.

Co-authored-by: Adrien Grand <jpountz@gmail.com>

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit 794f941f52f77737633ea23cd1892d330ce030a7 in lucene's branch refs/heads/branch_9x from Luca Cavanna https://gitbox.apache.org/repos/asf?p=lucene.git;h=794f941

LUCENE-10002: Replace simple usages of TotalHitCountCollector with IndexSearcher#count (#612)

In case only number of documents are collected, IndexSearcher#search(Query, Collector) is commonly used, which does not use the executor that's been eventually set to the searcher. Calling IndexSearcher#count(Query) makes the code more concise and is also more correct as it honours the executor that's been set to the searcher instance.

Co-authored-by: Adrien Grand <jpountz@gmail.com>

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit ee7a8d6918717e41e77afcde2c898655daa2173a in lucene's branch refs/heads/main from Luca Cavanna https://gitbox.apache.org/repos/asf?p=lucene.git;h=ee7a8d6

LUCENE-10002: Replace some IndexSearcher#search(Collector, Query) in tests (#639)

Also use LuceneTestCase#newSearcher

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit dba9b180bfaad1351e0c333b252a36bd33e61dbd in lucene's branch refs/heads/branch_9x from Luca Cavanna https://gitbox.apache.org/repos/asf?p=lucene.git;h=dba9b18

LUCENE-10002: Replace some IndexSearcher#search(Collector, Query) in tests (#639)

Also use LuceneTestCase#newSearcher

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1b083ea03936469aeb47f1c76294ec2b706c3638 in lucene's branch refs/heads/main from Luca Cavanna https://gitbox.apache.org/repos/asf?p=lucene.git;h=1b083ea

LUCENE-10002: Replace test usages of TopScoreDocCollector with a corresponding collector manager (#716)

In the effort or replacing usages of IndexSearcher#search(Query, Collector) with IndexSearcher#search(Query, CollectorManager), this commit replaces many test usages of TopScoreDocCollector with its corresponding CollectorManager created by calling TopScoreDocCollector#createSharedManager.

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit bfe7096565eb2ecaa8c8c2419bdcc18c9e36d49e in lucene's branch refs/heads/branch_9x from Luca Cavanna https://gitbox.apache.org/repos/asf?p=lucene.git;h=bfe7096

LUCENE-10002: Replace test usages of TopScoreDocCollector with a corresponding collector manager (#716)

In the effort or replacing usages of IndexSearcher#search(Query, Collector) with IndexSearcher#search(Query, CollectorManager), this commit replaces many test usages of TopScoreDocCollector with its corresponding CollectorManager created by calling TopScoreDocCollector#createSharedManager.

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit bff4246476d860942e1b20dae2540b5caae2eda9 in lucene's branch refs/heads/main from Adrien Grand https://gitbox.apache.org/repos/asf?p=lucene.git;h=bff4246

LUCENE-10002: Fix test failure.

When IndexSearcher is created with a threadpool it becomes impossible to assert on the number of evaluated hits overall.

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit 2a6b2ca1435ddb719bf0834d035ec38b7401c931 in lucene's branch refs/heads/branch_9x from Adrien Grand https://gitbox.apache.org/repos/asf?p=lucene.git;h=2a6b2ca

LUCENE-10002: Fix test failure.

When IndexSearcher is created with a threadpool it becomes impossible to assert on the number of evaluated hits overall.

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit 66bbc9558619ca1714bc2c44c836ee40325eeaa5 in lucene's branch refs/heads/main from Luca Cavanna https://gitbox.apache.org/repos/asf?p=lucene.git;h=66bbc95

LUCENE-10002: Add FixedBitSetCollector and corresponding collector manager to test framework (#766)

Some tests collect matching docs in a FixedBitSet. In the effort of moving such tests to using IndexSearcher#search(Query, CollectorManager) as part of LUCENE-10002, this commit adds a new FixedBitSetCollector class that exposes this functionality as well as a createManager method that returns a corresponding CollectorManager.

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit 3bd6e32a96897fd19fabce49a8876b54a3788c42 in lucene's branch refs/heads/branch_9x from Luca Cavanna https://gitbox.apache.org/repos/asf?p=lucene.git;h=3bd6e32

LUCENE-10002: Add FixedBitSetCollector and corresponding collector manager to test framework (#766)

Some tests collect matching docs in a FixedBitSet. In the effort of moving such tests to using IndexSearcher#search(Query, CollectorManager) as part of LUCENE-10002, this commit adds a new FixedBitSetCollector class that exposes this functionality as well as a createManager method that returns a corresponding CollectorManager.

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit e7f9f2c50d2e6844b6d91fcbc14e97629cbfe90b in lucene's branch refs/heads/main from Luca Cavanna https://gitbox.apache.org/repos/asf?p=lucene.git;h=e7f9f2c50d2

LUCENE-10002: Replaces usages of FacetsCollector with FacetsCollectorManager (#764)

In the effort of decreasing usages of IndexSearcher#search(query, Collector) by using the corresponding method that accepts a collector manager, this commit replaces many usages of FacetsCollector with its corresponding existing collector manager.

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit 7ed0f3d7adb5654e5bc1be024b47f589bae1ed55 in lucene's branch refs/heads/main from Luca Cavanna https://gitbox.apache.org/repos/asf?p=lucene.git;h=7ed0f3d7adb

11520: Add support for concurrent facets random sampling (#765)

This commit adds a new createManager static method to RandomSamplingFacetsCollector that allows users to perform random sampling concurrently. The returned collector manager is very similar to the existing FacetsCollectorManager but it exposes a specialized reduced RandomSamplingFacetsCollector.

This relates to LUCENE-10002. It allows users to use a collector manager instead of a collector when doing random sampling, in the effort of reducing usages of IndexSearcher#search(Query, Collector).

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit 395126a35b977fece701845db9cba716d18e59bb in lucene's branch refs/heads/branch_9x from Luca Cavanna https://gitbox.apache.org/repos/asf?p=lucene.git;h=395126a35b9

LUCENE-10002: Replaces usages of FacetsCollector with FacetsCollectorManager (#764)

In the effort of decreasing usages of IndexSearcher#search(query, Collector) by using the corresponding method that accepts a collector manager, this commit replaces many usages of FacetsCollector with its corresponding existing collector manager.

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit b0ecba4fb243d24d083325c7482f9725bccc57db in lucene's branch refs/heads/branch_9x from Luca Cavanna https://gitbox.apache.org/repos/asf?p=lucene.git;h=b0ecba4fb24

11520: Add support for concurrent facets random sampling (#765)

This commit adds a new createManager static method to RandomSamplingFacetsCollector that allows users to perform random sampling concurrently. The returned collector manager is very similar to the existing FacetsCollectorManager but it exposes a specialized reduced RandomSamplingFacetsCollector.

This relates to LUCENE-10002. It allows users to use a collector manager instead of a collector when doing random sampling, in the effort of reducing usages of IndexSearcher#search(Query, Collector).

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit 74e9716aec74e862b3073e01d3ccbccb199b41e0 in lucene's branch refs/heads/main from Luca Cavanna https://gitbox.apache.org/repos/asf?p=lucene.git;h=74e9716aec7

LUCENE-10002: move MemoryIndex to search(Query, CollectorManager) (#785)

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit 1cf1b301af050c9aaedec6bfcbaaebafa6fa3241 in lucene's branch refs/heads/main from Luca Cavanna https://gitbox.apache.org/repos/asf?p=lucene.git;h=1cf1b301af0

LUCENE-10002: replace more usages of search(Query, Collector) in tests (#787)

This commit replaces more usages of search(Query, Collector) with calling the corresponding search(Query, CollectorManager) instead. This round focuses on tests that implement custom collector, that need a corresponding collector manager.

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit 37434ffb1fcaf5e7a9096b13204fd640a9c8113e in lucene's branch refs/heads/branch_9x from Luca Cavanna https://gitbox.apache.org/repos/asf?p=lucene.git;h=37434ffb1fc

LUCENE-10002: move MemoryIndex to search(Query, CollectorManager) (#785)

asfimport commented 2 years ago

ASF subversion and git services (migrated from JIRA)

Commit ccd21fa5d9df7f2a30cd81784f49b3f08116c300 in lucene's branch refs/heads/branch_9x from Luca Cavanna https://gitbox.apache.org/repos/asf?p=lucene.git;h=ccd21fa5d9d

LUCENE-10002: replace more usages of search(Query, Collector) in tests (#787)

This commit replaces more usages of search(Query, Collector) with calling the corresponding search(Query, CollectorManager) instead. This round focuses on tests that implement custom collector, that need a corresponding collector manager.