elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.23k stars 24.85k forks source link

ESQL: Support "Look around" "Buckets" #116207

Open BenB196 opened 1 week ago

BenB196 commented 1 week ago

Description

There are use-cases with data where it would be nice to be able to get either the previous or next row in a query to do additional analysis (sparse data sets)

Example:

If I have data like (Note: The @timestamp values):

@timestamp index num_of_docs
2024-11-04T00:00:00.000Z index_1 0
2024-11-04T00:00:10.000Z index_1 1
2024-11-04T00:10:00.000Z index_1 3
2024-11-04T01:00:00.000Z index_1 6
2024-11-04T10:00:00.000Z index_1 12

Now, I would like to compute the rate (difference over time) for the num_of_docs in index over @timestamp.

However, I want to keep my bucket size at ~5 minutes.

Ideally, I'd like an output like:

@timestamp index num_of_docs num_of_docs_diff
2024-11-04T00:00:00.000Z index_1 0 null/0
2024-11-04T00:05:00.000Z index_1 1 1
2024-11-04T00:10:00.000Z index_1 3 2
2024-11-04T00:15:00.000Z index_1 3 null/0
... ... ... ...
2024-11-04T00:55:00.000Z index_1 3 null/0
2024-11-04T01:00:00.000Z index_1 6 3
2024-11-04T01:05:00.000Z index_1 6 null/0
... ... ... ...
2024-11-04T09:55:00.000Z index_1 6 null/0
2024-11-04T10:00:00.000Z index_1 12 6
2024-11-04T10:05:00.000Z index_1 12 null/0

My ES|QL query today would look something like:

FROM metrics-*
| STATS max = MAX(`num_of_docs`), min = MIN(`num_of_docs`)
    BY index, BUCKET(@timestamp, 5 minute)
| EVAL num_of_docs_diff = max - min

This won't work in our case, as the bucket interval is too small to be able to reliably capture the previous event in the data set as it is outside of the bucket window.

Being able to have a way to tell ES|QL to have two different "BUCKETS" would be useful here, where one of the "BUCKETS" would be the "normal" 5 minute one for actual grouping, but then having a second "BUCKET" be somewhat like a "look around" bucket, where a STATS function would check outside the first BUCKETs range for additional values.

(I'm sure there are better words to use for this type of feature, so feel free to update/change)

elasticsearchmachine commented 1 week ago

Pinging @elastic/es-analytical-engine (Team:Analytics)