elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.6k stars 24.63k forks source link

Clean up `shortcutTotalHitCount` using the new `Weight#count` API #81034

Open jpountz opened 2 years ago

jpountz commented 2 years ago

Elasticsearch has a special shortcutTotalHitCount hack in TopDocsCollectorContext.java that it uses to avoid collecting all matches only to get the hit count when the hit count can otherwise be inferred from index statistics.

Lucene introduced a new API that does exactly that and that should support more queries in the near future. We should cut over usage of shortcutTotalHitCount to Weight#count instead.

elasticmachine commented 2 years ago

Pinging @elastic/es-search (Team:Search)

OlivierCavadenti commented 2 years ago

I take

OlivierCavadenti commented 2 years ago

The PR in Lucene that add Weight#count : https://github.com/apache/lucene/pull/242

nik9000 commented 2 years ago

@OlivierCavadenti you may be interested in https://github.com/elastic/elasticsearch/pull/81322 - some of the plumbing I had to to there is the same as you'd have to do for this.

OlivierCavadenti commented 2 years ago

@OlivierCavadenti you may be interested in #81322 - some of the plumbing I had to to there is the same as you'd have to do for this.

Thanks ! I think I am near something but I have two tests that fail (testCountWithoutDeletions => "should not collect more than 0 doc per segment, got 1", it seems we have a special case with hasDeletions == false ? ) and testNumericSortOptimization (java.lang.AssertionError: Expected :GREATER_THAN_OR_EQUAL_TO Actual : EQUAL_TO).

I will investigate this week-end.

javanna commented 1 year ago

I have reverted #89047, which means that we still have our own shortcutTotalHitCount which is independent from Weight#count in Lucene. This is something we will want to address. One way I considered was creating a throw-away Weight, but that requires duplicating the Weight which may have a cost. Ideally, the mechanism is brought to Lucene entirely, which is something that I will look into.

elasticsearchmachine commented 2 months ago

Pinging @elastic/es-search-foundations (Team:Search Foundations)