As an alternative, we could either build global ordinals and use that as group key, or we could use segment-level ordinals and remap those when we go to the next leafreader. TermGroupSelector does a similar thing, but caches all resolved term values, which could have serious memory implications on high-cardinality keyword fields.
We've seen users experience slow query performance when collapsing on keyword fields. It manifests as follows in hot_threads:
We are repeatedly decompressing the doc values terms dictionary in order to look up values, and we do this lookup on each document we collect just to find the right group in
SinglePassGroupingCollector
: https://github.com/elastic/elasticsearch/blob/f31c36482ec3481a261733fc44962a96eb214b01/server/src/main/java/org/elasticsearch/lucene/grouping/SinglePassGroupingCollector.java#L301 which calls https://github.com/elastic/elasticsearch/blob/f31c36482ec3481a261733fc44962a96eb214b01/server/src/main/java/org/elasticsearch/lucene/grouping/GroupingDocValuesSelector.java#L163As an alternative, we could either build global ordinals and use that as group key, or we could use segment-level ordinals and remap those when we go to the next leafreader.
TermGroupSelector
does a similar thing, but caches all resolved term values, which could have serious memory implications on high-cardinality keyword fields.