Open not-napoleon opened 4 years ago
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)
Pinging @elastic/es-core-infra (:Core/Infra/Scripting)
I think #47469 will complicate this somewhat.
I think #47469 will complicate this somewhat.
Can you elaborate? I don't think the doc value formatter gets applied to script outputs or inputs (other than missing values, but missing already doesn't work with ranges in general, in part because of that issue).
I thought there was some issue where we assumed that the value of a field in the response matched the value of the field on disk. I thought this would change the value of a range to a date or number. But maybe it won't do that anyway.
That's...possible, I'm not sure. I do remember I had to do some jank to get DateHistogram
over ranges to format the bucket keys correctly, and this might hit something similar. My instinct is we should be able to work around that without fully solving the formatter issue though. Or, at least, that's my hope. I'm worried that #47469 is going to require a query syntax change and thus will need some deprecation plan, and I really don't want to block scripted ranges on that. But if we have to, we have to. 🤞 it doesn't come to that...
This is intended to be a starting discussion about the options and possibilities for range scripting. While working on aggregation support for range fields, we've encountered a couple of situations that seem better served by scripting than by new aggregations or aggregation parameters. A couple of examples:
We would like to provide numerical metric aggregations, such as max and min, on ranges, but there is no natural ordering so we would need a way for users to specify how to order (e.g. by start point, end point, range width, etc). To support this strictly in aggs, we'd either need range specific metric aggregations, or to add a new parameter to select what mode to use. Neither is terribly appealing. Instead, we'd like to let users send a script that would extract whatever single number the user wants for the metric to operate on.
Open ended ranges (or any very large range, really) currently generate too many buckets and trip a circuit breaker when aggregating (see #50109). Scripting could offer a recourse for this situation, allowing users to either rewrite large ranges (in the linked issue, they want to replace their unbounded endpoint with
now()
, for example) or just omitting ranges that will produce too many buckets.To support this, several things need to happen:
Objects
in Java, and can resolve to a couple of different types depending on the type of ranges. See https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/index/mapper/RangeType.java for the list of supported types.ValueType
orValuesSourceType
support on the aggregation side. I'm not sure what's required on the scripting side.ValuesSource
logic currently assumes that a Value Script will yield the same type as its input field, which means we can't use that to extract a single numeric value from a range. We probably need to make some changes aggregation side for this.