elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.23k stars 24.85k forks source link

[TSDB] Discuss difference of timeseries rate API results vs regular rate API #100055

Open dej611 opened 1 year ago

dej611 commented 1 year ago

Description

As discussed offline in Kibana we're exploring the possibility to migrate off to the new time series rate API for counter field types, but we're encountering some issues with the given results.

Kibana work: https://github.com/elastic/kibana/issues/152537

Given two fields of type long (raw_field) and type: counter (counter_field) with exactly 2 documents within every hour, but at different timings, the results for ES rate vs timeseries rate API are computed for a hourly bucket:

timestamp items_sold total_items_sold   es rate (raw_field) ts es rate (counter_field)
10:01 1 1      
10:59 1 2   2 1.034482759
11:01 1 3      
11:03 1 4   2 30
12:58 1 5      
12:59 1 6   2 1.034482759
13:01 1 7      
13:02 1 8   2 40
14:01 1 9      
14:05 1 10   2 1.904761905
15:01 1 11      
15:59 1 12   2 1.052631579
16:01 1 13      
16:02 1 14   2 40
17:58 1 15      
17:59 1 16   2 1.025641026
18:58 1 17      
18:59 1 18   2 2

From our understanding the algorithm between the two versions diverges based on when the events are happening. As shown here the results diverge quite a lot for this use case. But any use case with irregular updates is subjective to such difference.

To discuss:

cc @ppisljar @timductive

elasticsearchmachine commented 1 year ago

Pinging @elastic/es-analytics-geo (Team:Analytics)

salvatore-campagna commented 1 year ago

A few links showing how the calculation of rate is in general an issue for other time series databases too:

Here we see Prometheus and VictoriaMetrics taking different approaches including interpolation, extrapolation, extending the time window. As you will see our solution (rate aggregation specific for time series database) uses a similar approach with the difference that we always extend the time window to take into account the last value of the previous (date histogram) bucket. This prevents the issue of having no data in case the date histogram calendar_interval is less that 2 * scrape_interval. Note, anyway, that this happens only if there is no filter in the query which filters out the sample from the previous (date histogram) bucket.

elasticsearchmachine commented 8 months ago

Pinging @elastic/es-storage-engine (Team:StorageEngine)