Open dej611 opened 1 year ago
Pinging @elastic/es-analytics-geo (Team:Analytics)
A few links showing how the calculation of rate
is in general an issue for other time series databases too:
Here we see Prometheus and VictoriaMetrics taking different approaches including interpolation, extrapolation, extending the time window. As you will see our solution (rate aggregation specific for time series database) uses a similar approach with the difference that we always extend the time window to take into account the last value of the previous (date histogram) bucket. This prevents the issue of having no data in case the date histogram calendar_interval
is less that 2 * scrape_interval
. Note, anyway, that this happens only if there is no filter in the query which filters out the sample from the previous (date histogram) bucket.
Pinging @elastic/es-storage-engine (Team:StorageEngine)
Description
As discussed offline in Kibana we're exploring the possibility to migrate off to the new time series rate API for
counter
field types, but we're encountering some issues with the given results.Kibana work: https://github.com/elastic/kibana/issues/152537
Given two fields of type
long
(raw_field
) and type:counter
(counter_field
) with exactly 2 documents within every hour, but at different timings, the results for ES rate vs timeseries rate API are computed for a hourly bucket:From our understanding the algorithm between the two versions diverges based on when the events are happening. As shown here the results diverge quite a lot for this use case. But any use case with irregular updates is subjective to such difference.
To discuss:
cc @ppisljar @timductive