support search_analyzer on keyword fields for search time synonyms

elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine

https://www.elastic.co/products/elasticsearch

Other

69.94k stars 24.74k forks source link

support search_analyzer on keyword fields for search time synonyms #23336

Closed mikeh-elastic closed 6 years ago

mikeh-elastic commented 7 years ago

Allowing a search_analyzer on keyword fields could allow for search time synonyms and run the same character and token filters used in a normalizer to be used with normalized keywords.

clintongormley commented 7 years ago

@jpountz what do you think?

jpountz commented 7 years ago

Actually I was also thinking about it, but for a different use-case: telling the query-parser that it should split on whitespace for instance. So you could have a keyword field with a whitespace search_analyzer, and this would make query parser parse foo bar as my_keyword:foo OR my_keyword:bar rather than a single foo bar token. I guess search-time synonyms could make sense too.

akotlar commented 6 years ago

With split_on_whitespace disabled in release 6, allowing keyword search_analyzer is critical. Else I would need to index both a text and keyword subfield for fields that would otherwise work well as keyword fields alone (to enable both terms aggregation and search on these fields when the query string is > 1 word).

The new split_on_whitespace behavior prevents me from upgrading to ES 6.

akotlar commented 6 years ago

Any interest/feedback on this? There are several reasons I think this would be useful: 1) Prevents the need to build a custom DSL using jison, peg, regex to properly split strings into single tokens, which is now required to search on keyword fields using the Query DSL. 2) Prevents the need to build a keyword sub-field for every single-token text field, which: 1) Reduces index size: I have indices can be on the order of 120GB, per job, with N jobs. best_compression mitigates this somewhat. 2) Reduce mapping complexity. 3) Re-uses existing features that map naturally to this cognitive domain: search_analyzer is already meant to change the query-time behavior of an input (query) string relative to the indexed input strings (apply something other than the analyzer transformations to the query string).

jpountz commented 6 years ago

cc @elastic/es-search-aggs Relates #29051

jimczi commented 6 years ago

We discussed during FixItFriday and agreed that adding a search_analyzer to keyword can be trappy. Instead users should define two fields, one for aggregations with the keyword type and index disabled and one for search with the text type and a custom search_analyzer. This solution should be documented to help users that need this kind of flexibility. I'll open a new issue for the documentation issue.