Open not-napoleon opened 5 years ago
Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)
As per conversation with @jpountz , while we want to split this out for aggregations, we don't want to make a change at the field mapping level.
Adding another use of docvalues formats for context: they're used to format docvalues_fields
in the search response. It's easy to forget this use case and we have some related bugs (#53246, #55993).
@jpountz @jtibshirani I would like to start addressing this issue on the aggs level. Just to keep things simple, I would like to narrow the scope of this particular issue to depreciating the format parameter in aggregations and introducing two separate input_format
and output_format
with first one to be used for handling missing parameter and the second one to be used for handling the formatting of the keys in bucket aggs and string representation of values in metrics aggs.
In other words we will not address the issues raised by Julie nor change the way format in field mapping works, we will just start with ability to have 2 separate format strings. If you are ok with this I will rename the issue accordingly.
@imotov Internally distinguishing the input format and the output format makes sense to me. I know of use-cases for configuring the output format on aggregations, e.g. making sure dates are formatted as millis since epoch rather than the format that is configured on the field. However I'm not sure I know of a use-case for configuring the input format, do you know of one?
Yes, we are using it while parsing string values specified by the user in the missing
parameter, for example.
FieldType
defines a methoddocValueFormat(String format, ZoneId timezone)
which specific field types can override to provide support for formatters in aggregations. The resultingDocValueFormat
, however, has multiple meanings which are sometimes incompatible. There are at least three different ways in which this formatter is used:missing
parameterThere may be other, as yet unidentified, roles this formatter is being used for.
This creates a number of challenges, most notably when the input type and output type of an aggregation differ, as the same formatter will not make sense for both the input and the output. Most aggregations to this point have not encountered this issue because the input and output types line up, but as we try to expand aggregation support to more field types, that will stop being true. Especially after the planned values source refactor (see #42949), which will enable dynamically registering type support for aggregations, relying on inputs and outputs using the same format breaks down.
Please use this issue to discuss possible solutions and record issues we are running into with the current implementation.