[TSDB] Discuss difference of timeseries rate API results vs regular rate API

dej611 commented 1 year ago

Description

As discussed offline in Kibana we're exploring the possibility to migrate off to the new time series rate API for counter field types, but we're encountering some issues with the given results.

Kibana work: https://github.com/elastic/kibana/issues/152537

Given two fields of type long (raw_field) and type: counter (counter_field) with exactly 2 documents within every hour, but at different timings, the results for ES rate vs timeseries rate API are computed for a hourly bucket:

timestamp	items_sold	total_items_sold	es rate (raw_field)	ts es rate (counter_field)
10:01	1	1
10:59	1	2	2	1.034482759
11:01	1	3
11:03	1	4	2	30
12:58	1	5
12:59	1	6	2	1.034482759
13:01	1	7
13:02	1	8	2	40
14:01	1	9
14:05	1	10	2	1.904761905
15:01	1	11
15:59	1	12	2	1.052631579
16:01	1	13
16:02	1	14	2	40
17:58	1	15
17:59	1	16	2	1.025641026
18:58	1	17
18:59	1	18	2	2

From our understanding the algorithm between the two versions diverges based on when the events are happening. As shown here the results diverge quite a lot for this use case. But any use case with irregular updates is subjective to such difference.

To discuss:

what is the time series rate result representing?
- should it be used on its own, or rather it requires some composition with other aggs to provide meaningful results?
- should it provide more metadata as result? i.e. the timespan used for each bucket to compute the rate result...

cc @ppisljar @timductive

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-analytics-geo (Team:Analytics)

salvatore-campagna commented 1 year ago

A few links showing how the calculation of rate is in general an issue for other time series databases too:

Here we see Prometheus and VictoriaMetrics taking different approaches including interpolation, extrapolation, extending the time window. As you will see our solution (rate aggregation specific for time series database) uses a similar approach with the difference that we always extend the time window to take into account the last value of the previous (date histogram) bucket. This prevents the issue of having no data in case the date histogram calendar_interval is less that 2 * scrape_interval. Note, anyway, that this happens only if there is no filter in the query which filters out the sample from the previous (date histogram) bucket.

elasticsearchmachine commented 8 months ago

Pinging @elastic/es-storage-engine (Team:StorageEngine)

elastic / elasticsearch

[TSDB] Discuss difference of timeseries rate API results vs regular rate API #100055

Description