elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.27k stars 24.86k forks source link

Separation of Concerns in DocValueFormat #47469

Open not-napoleon opened 5 years ago

not-napoleon commented 5 years ago

FieldType defines a method docValueFormat(String format, ZoneId timezone) which specific field types can override to provide support for formatters in aggregations. The resulting DocValueFormat, however, has multiple meanings which are sometimes incompatible. There are at least three different ways in which this formatter is used:

There may be other, as yet unidentified, roles this formatter is being used for.

This creates a number of challenges, most notably when the input type and output type of an aggregation differ, as the same formatter will not make sense for both the input and the output. Most aggregations to this point have not encountered this issue because the input and output types line up, but as we try to expand aggregation support to more field types, that will stop being true. Especially after the planned values source refactor (see #42949), which will enable dynamically registering type support for aggregations, relying on inputs and outputs using the same format breaks down.

Please use this issue to discuss possible solutions and record issues we are running into with the current implementation.

elasticmachine commented 5 years ago

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

not-napoleon commented 4 years ago

As per conversation with @jpountz , while we want to split this out for aggregations, we don't want to make a change at the field mapping level.

jtibshirani commented 4 years ago

Adding another use of docvalues formats for context: they're used to format docvalues_fields in the search response. It's easy to forget this use case and we have some related bugs (#53246, #55993).

imotov commented 3 years ago

@jpountz @jtibshirani I would like to start addressing this issue on the aggs level. Just to keep things simple, I would like to narrow the scope of this particular issue to depreciating the format parameter in aggregations and introducing two separate input_format and output_format with first one to be used for handling missing parameter and the second one to be used for handling the formatting of the keys in bucket aggs and string representation of values in metrics aggs.

In other words we will not address the issues raised by Julie nor change the way format in field mapping works, we will just start with ability to have 2 separate format strings. If you are ok with this I will rename the issue accordingly.

jpountz commented 3 years ago

@imotov Internally distinguishing the input format and the output format makes sense to me. I know of use-cases for configuring the output format on aggregations, e.g. making sure dates are formatted as millis since epoch rather than the format that is configured on the field. However I'm not sure I know of a use-case for configuring the input format, do you know of one?

imotov commented 3 years ago

Yes, we are using it while parsing string values specified by the user in the missing parameter, for example.