elastic / elasticsearch

Free and Open, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.41k stars 24.57k forks source link

Support dynamic pruning in the `composite` aggregation #88185

Open jpountz opened 2 years ago

jpountz commented 2 years ago

Description

Queries sorted by field have improved a lot over the years when it comes to dynamic pruning:

  1. Ancient versions of Elasticsearch would always collect all the matches to return a single page of data. This would have terrible performance when paging through all hits, since it would essentially run in quadratic time with the number of documents in a shard.
  2. Then we introduced index sorting and queries whose sort order is congruent with the index sort could skip irrelevant data.
  3. Then we introduced dynamic pruning when the sort field is indexed with points, by leveraging the index to skip hits that cannot possibly make it to the page that we are retrieving. This yielded major speedups when paginating through all hits. This is the current state.
  4. In the future, we should look into supporting dynamic pruning when sorting on keyword fields too.

The composite aggregation is very similar to sorted queries, yet it is currently at stage 2 in the evolution of sorted queries with regards to dynamic pruning. Unless you are aggregating on the primary index sort field, computing a single page of data requires collecting all matches that match the query.

Can we add dynamic pruning support to the composite aggregation so that computing a single page of results wouldn't need to look at all matches? Ideally it would reuse the same logic that we are using for sorting queries via the LeafFieldComparator#competitiveIterator and LeafCollector#competitiveIterator APIs.

Relates to #85759.

elasticmachine commented 2 years ago

Pinging @elastic/es-analytics-geo (Team:Analytics)

jpountz commented 2 years ago

We would probably need to figure out dynamic pruning on keyword fields if we want to get benefits on the composite aggregation since it's often used in combination with keyword fields. I opened LUCENE-10633.

wchaparro commented 1 year ago

Added to: https://github.com/elastic/elasticsearch/issues/65019