Closed mikeh-elastic closed 6 years ago
@jpountz what do you think?
Actually I was also thinking about it, but for a different use-case: telling the query-parser that it should split on whitespace for instance. So you could have a keyword field with a whitespace search_analyzer
, and this would make query parser parse foo bar
as my_keyword:foo OR my_keyword:bar
rather than a single foo bar
token. I guess search-time synonyms could make sense too.
With split_on_whitespace disabled in release 6, allowing keyword search_analyzer is critical. Else I would need to index both a text and keyword subfield for fields that would otherwise work well as keyword fields alone (to enable both terms aggregation and search on these fields when the query string is > 1 word).
The new split_on_whitespace behavior prevents me from upgrading to ES 6.
Any interest/feedback on this? There are several reasons I think this would be useful:
1) Prevents the need to build a custom DSL using jison, peg, regex to properly split strings into single tokens, which is now required to search on keyword fields using the Query DSL.
2) Prevents the need to build a keyword sub-field for every single-token text field, which:
1) Reduces index size: I have indices can be on the order of 120GB, per job, with N jobs. best_compression mitigates this somewhat.
2) Reduce mapping complexity.
3) Re-uses existing features that map naturally to this cognitive domain: search_analyzer
is already meant to change the query-time behavior of an input (query) string relative to the indexed input strings (apply something other than the analyzer
transformations to the query string).
cc @elastic/es-search-aggs Relates #29051
We discussed during FixItFriday and agreed that adding a search_analyzer
to keyword can be trappy.
Instead users should define two fields, one for aggregations with the keyword
type and index disabled and one for search with the text
type and a custom search_analyzer
. This solution should be documented to help users that need this kind of flexibility. I'll open a new issue for the documentation issue.
Allowing a search_analyzer on keyword fields could allow for search time synonyms and run the same character and token filters used in a normalizer to be used with normalized keywords.