The TimeSeriesAggregator used to process data in (_tsid, @timestamp) order. As a result of introducing _tsid hashing now the TimeSeriesAggregator processes data in (_tsid hash, @timestamp) order. Because of this we need to re-sort data in TimeSeriesAggregator#buildAggregations. This is required because in the collect method we assume that a bucket is exhausted when the _tsid hash changes. Anyway, sorting on _tsid and _tsid hash might result in different sorting due to hashing. This is not really ideal performance-wise because of the in-memoery sorting which slows down time series aggregations.
Ideally we would like to avoid re-sorting into the aggregator. Maybe we can move access to doc values which we now have in TimeSeriesAggregator#getLeafCollector up into TimeSeriesAggregator#buildAggregations avoiding re-sorting of buckets. That, anyway requires us to keep track of ordinals and the segment they belong to so that we can read doc values correctly and fill the aggregation result with correct dimension values.
Steps to Reproduce
Just run a time series aggregation on a time series index.
Elasticsearch Version
8.13 and above
Installed Plugins
No response
Java Version
bundled
OS Version
All
Problem Description
The TimeSeriesAggregator used to process data in
(_tsid, @timestamp)
order. As a result of introducing _tsid hashing now the TimeSeriesAggregator processes data in(_tsid hash, @timestamp)
order. Because of this we need to re-sort data inTimeSeriesAggregator#buildAggregations
. This is required because in thecollect
method we assume that a bucket is exhausted when the _tsid hash changes. Anyway, sorting on _tsid and _tsid hash might result in different sorting due to hashing. This is not really ideal performance-wise because of the in-memoery sorting which slows down time series aggregations.Ideally we would like to avoid re-sorting into the aggregator. Maybe we can move access to doc values which we now have in
TimeSeriesAggregator#getLeafCollector
up intoTimeSeriesAggregator#buildAggregations
avoiding re-sorting of buckets. That, anyway requires us to keep track of ordinals and the segment they belong to so that we can read doc values correctly and fill the aggregation result with correct dimension values.Steps to Reproduce
Just run a time series aggregation on a time series index.
Logs (if relevant)
No response