elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.12k stars 24.83k forks source link

ESQL: have BUCKET produce empty buckets #111483

Open bpintea opened 3 months ago

bpintea commented 3 months ago

Description

BUCKET currently produces buckets only for those ranges in which there are documents. It can be useful to produce empty buckets (and then an appropriate agg value, like 0 or null etc.) for those ranges that fall within the provided interval, but for which there's no data. This would be similar to the behaviour that extended_bounds produces to the histograms in _search.

elasticsearchmachine commented 3 months ago

Pinging @elastic/es-analytical-engine (Team:Analytics)

teresaalvarezsoler commented 3 weeks ago

This feature has been identified as critical for converting all Lens visualizations to ES|QL because all time series in Lens are built with this feature ON by default. I already told with @tylerperk it would be good to prioritize this one soon. Thanks.

Image

ppisljar commented 1 week ago

we also need to know the interval used for each column produced using BUCKET operation in the response.

for example in a simple date histogram bar chart we need to know not just the time when the bar starts but also the end. This is not possible to assume from the data (for example i ask for 1 week of data and i get a single bucket back due to just a few datapoints in the last week. but now i cant know if this is 1 week, 1 day, 1 hour, ....)

for kibana it would be hard to assume the actual interval used directly from the query as the query could be rather complicated (possibly the actual bucket size depends on the data ? ) but it needs this information to correctly render the charts.

i propose adding information about used interval to meta information of every column that was produced using BUCKET operation.

This information would also allow us to calculate all the empty bucket boundaries on kibana side, so we wouldn't need elasticsearch to return those as part of the response when include empty rows is set, which would result in smaller payloads being sent over the wire.

ps: this information is provided in the date_histogram agg response: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-autodatehistogram-aggregation.html