elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
69.97k stars 24.75k forks source link

Improve performance to collapse on keyword fields #84564

Open ywelsch opened 2 years ago

ywelsch commented 2 years ago

We've seen users experience slow query performance when collapsing on keyword fields. It manifests as follows in hot_threads:

   100.0% [cpu=77.8%, other=22.2%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[XXX][search][T#6]'
     8/10 snapshots sharing following 34 elements
       app//org.apache.lucene.util.compress.LZ4.decompress(LZ4.java:137)
       app//org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$TermsDict.decompressBlock(Lucene80DocValuesProducer.java:1298)
       app//org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$TermsDict.next(Lucene80DocValuesProducer.java:1153)
       app//org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$TermsDict.seekExact(Lucene80DocValuesProducer.java:1185)
       app//org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$BaseSortedDocValues.lookupOrd(Lucene80DocValuesProducer.java:1049)
       app//org.apache.lucene.search.grouping.CollapsingDocValuesSource$Keyword.currentValue(CollapsingDocValuesSource.java:166)
       app//org.apache.lucene.search.grouping.CollapsingDocValuesSource$Keyword.currentValue(CollapsingDocValuesSource.java:141)
       app//org.apache.lucene.search.grouping.FirstPassGroupingCollector.collect(FirstPassGroupingCollector.java:200)
       app//org.apache.lucene.search.grouping.CollapsingTopDocsCollector.collect(CollapsingTopDocsCollector.java:126)
       app//org.apache.lucene.search.Weight$DefaultBulkScorer.scoreRange(Weight.java:258)
       app//org.apache.lucene.search.Weight$DefaultBulkScorer.score(Weight.java:245)
       app//org.elasticsearch.search.internal.CancellableBulkScorer.score(CancellableBulkScorer.java:45)
       app//org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
       app//org.elasticsearch.search.internal.ContextIndexSearcher.searchLeaf(ContextIndexSearcher.java:194)
       app//org.elasticsearch.search.internal.ContextIndexSearcher.search(ContextIndexSearcher.java:167)
       app//org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:443)
       app//org.elasticsearch.search.query.QueryPhase.searchWithCollector(QueryPhase.java:255)
       app//org.elasticsearch.search.query.QueryPhase.executeInternal(QueryPhase.java:212)
       app//org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:98)
       app//org.elasticsearch.search.SearchService.loadOrExecuteQueryPhase(SearchService.java:458)
       app//org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:622)
       app//org.elasticsearch.search.SearchService.lambda$executeQueryPhase$2(SearchService.java:483)
       app//org.elasticsearch.search.SearchService$$Lambda$6246/0x0000000801c1b508.get(Unknown Source)
       app//org.elasticsearch.search.SearchService$$Lambda$6247/0x0000000801c1b730.get(Unknown Source)
       app//org.elasticsearch.action.ActionRunnable.lambda$supply$0(ActionRunnable.java:47)
       app//org.elasticsearch.action.ActionRunnable$$Lambda$6249/0x0000000801c1b958.accept(Unknown Source)
       app//org.elasticsearch.action.ActionRunnable$2.doRun(ActionRunnable.java:62)
       app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       app//org.elasticsearch.common.util.concurrent.TimedRunnable.doRun(TimedRunnable.java:33)
       app//org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:777)
       app//org.elasticsearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:26)
       java.base@17.0.1/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
       java.base@17.0.1/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
       java.base@17.0.1/java.lang.Thread.run(Thread.java:833)

We are repeatedly decompressing the doc values terms dictionary in order to look up values, and we do this lookup on each document we collect just to find the right group in SinglePassGroupingCollector: https://github.com/elastic/elasticsearch/blob/f31c36482ec3481a261733fc44962a96eb214b01/server/src/main/java/org/elasticsearch/lucene/grouping/SinglePassGroupingCollector.java#L301 which calls https://github.com/elastic/elasticsearch/blob/f31c36482ec3481a261733fc44962a96eb214b01/server/src/main/java/org/elasticsearch/lucene/grouping/GroupingDocValuesSelector.java#L163

As an alternative, we could either build global ordinals and use that as group key, or we could use segment-level ordinals and remap those when we go to the next leafreader. TermGroupSelector does a similar thing, but caches all resolved term values, which could have serious memory implications on high-cardinality keyword fields.

elasticmachine commented 2 years ago

Pinging @elastic/es-search (Team:Search)

elasticsearchmachine commented 3 months ago

Pinging @elastic/es-search-foundations (Team:Search Foundations)