Open dgieselaar opened 11 months ago
Pinging @elastic/es-ql (Team:QL)
Pinging @elastic/elasticsearch-esql (:Query Languages/ES|QL)
Do you have any examples of things? I can guess, but it'd be nice to have an example of the kinds of STATS you expect.
One problem with the STATS here is that ESQL allows a lot more slicing that _search does so it'd be easier to put the query into a state where it wouldn't have the data. I'm kind of imagining something like FROM foo | WHERE hostname = 'blah' | STATS PERCENTILES(bytes_out)
where hostname is a field that got removed in a downsampling operation. I suppose that thing's just not supported. I guess we'd get it for free by the field just not being there. Though maybe the error message should be different? I dunno.
@nik9000 AVG, SUM, MIN, MAX, Pxx. I'm not sure if I follow your example?
Pinging @elastic/es-analytics-geo (Team:Analytics)
I think first of all we'll have to support histogram field type (read and output at least). Since a histogram field is practically an object containing two arrays, I can imagine it returned as a JSON. Supporting new field types has some cost by itself and is not trivial.
After that, we can start defining the behavior for the single agg functions, starting from min
, max
, count
and avg
.
I guess it won't be much different from how _search implements them, eg. for the sample data reported here
PUT my-index-000001
{
"mappings" : {
"properties" : {
"my_histogram" : {
"type" : "histogram"
},
"my_text" : {
"type" : "keyword"
}
}
}
}
PUT my-index-000001/_doc/1
{
"my_text" : "histogram_1",
"my_histogram" : {
"values" : [0.1, 0.2, 0.3, 0.4, 0.5],
"counts" : [3, 7, 23, 12, 6]
}
}
PUT my-index-000001/_doc/2
{
"my_text" : "histogram_2",
"my_histogram" : {
"values" : [0.1, 0.25, 0.35, 0.4, 0.45, 0.5],
"counts" : [8, 17, 8, 7, 6, 2]
}
}
I guess the ESQL usage will be something like:
from my-index* | stats max = max(my_histogram), count = count(my_histogram) by my_text;
my_text | max | count
histogram_1 | 0.5 | 51
histogram_2 | 0.5 | 48
where max(my_histogram)
is calculated on the "values", while count(my_histogram)
is the sum of the "counts".
We will have to define the behavior of each single aggregation function, but at a first look it seems pretty natural at least for the basic aggs, and we can start from this as a guideline.
Wondering if it makes sense to allow histogram
fields in other commands apart from STATS. Maybe they can be used in EVAL for simple assignment (no manipulation, at least in a first phase) and KEEP/DROP, but it's hard for me to imagine how to use them in commands like SORT, ENRICH and so on.
Pinging @elastic/es-analytical-engine (Team:Analytics)
I suggest we wait to implement histogram support until we encode the algorithm in the field (see https://github.com/elastic/elasticsearch/issues/108208). This will let us choose the appropriate sketch for percentiles against the histogram, at a minimum, and may influence the implementation of other aggregations.
Hi,
I would like to suggest adding histogram aggregation on histogram field. There is already an issue for visualization of this, but the implementators decided to wait for ES|QL support. https://github.com/elastic/kibana/issues/112390#issuecomment-2009822801
As a customer we see cases for it in Observability area as we have histograms via OpenTelemetry.
Description
Currently, running an ES|QL aggregation command on a histogram field results in an error. Aggregations should be supported on histogram fields, similar to how _search supports aggregations on histogram fields.
Use cases
In APM we use histograms to store latency distribution data (in
transaction.duration.histogram
). The aggregations we currently run on this field are: avg, pxx, sum, value_count.