Separation of Concerns in DocValueFormat

not-napoleon commented 5 years ago

FieldType defines a method docValueFormat(String format, ZoneId timezone) which specific field types can override to provide support for formatters in aggregations. The resulting DocValueFormat, however, has multiple meanings which are sometimes incompatible. There are at least three different ways in which this formatter is used:

Parsing string values from the user supplied missing parameter
Formatting the key (i.e. bucket id) on bucket aggregations
Formatting the value on metric aggregations

There may be other, as yet unidentified, roles this formatter is being used for.

This creates a number of challenges, most notably when the input type and output type of an aggregation differ, as the same formatter will not make sense for both the input and the output. Most aggregations to this point have not encountered this issue because the input and output types line up, but as we try to expand aggregation support to more field types, that will stop being true. Especially after the planned values source refactor (see #42949), which will enable dynamically registering type support for aggregations, relying on inputs and outputs using the same format breaks down.

Please use this issue to discuss possible solutions and record issues we are running into with the current implementation.

elasticmachine commented 5 years ago

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

not-napoleon commented 4 years ago

As per conversation with @jpountz , while we want to split this out for aggregations, we don't want to make a change at the field mapping level.

jtibshirani commented 4 years ago

Adding another use of docvalues formats for context: they're used to format docvalues_fields in the search response. It's easy to forget this use case and we have some related bugs (#53246, #55993).

imotov commented 3 years ago

@jpountz @jtibshirani I would like to start addressing this issue on the aggs level. Just to keep things simple, I would like to narrow the scope of this particular issue to depreciating the format parameter in aggregations and introducing two separate input_format and output_format with first one to be used for handling missing parameter and the second one to be used for handling the formatting of the keys in bucket aggs and string representation of values in metrics aggs.

In other words we will not address the issues raised by Julie nor change the way format in field mapping works, we will just start with ability to have 2 separate format strings. If you are ok with this I will rename the issue accordingly.

jpountz commented 3 years ago

@imotov Internally distinguishing the input format and the output format makes sense to me. I know of use-cases for configuring the output format on aggregations, e.g. making sure dates are formatted as millis since epoch rather than the format that is configured on the field. However I'm not sure I know of a use-case for configuring the input format, do you know of one?

imotov commented 3 years ago

Yes, we are using it while parsing string values specified by the user in the missing parameter, for example.

elastic / elasticsearch

Separation of Concerns in DocValueFormat #47469