Open jimczi opened 7 years ago
I've been thinking about ways of doing this for keyword normalizers, one way would be to have a specialised AttributeSource that throws an exception when a tokenfilter adds a PositionLengthAttribute.
cc @elastic/es-search-aggs
This is related to work that @cbuescher is doing regarding index vs query-time analysis chains.
Pinging @elastic/es-search-relevance (Team:Search Relevance)
Currently it is possible to set a
synonym_graph
or aword_delimiter_graph
token filter in an analyzer that is used at index time. Though these filters can produce side-paths that will break the positions in the index and make phrase query matching impossible on the field. Theflatten_graph
token filter is supposed to handle this situation but it can only flatten the graph which is also a lossy operation. So whether the user adds aflatten_graph
filter at the end of the analyzer or not the positions of the terms in the index will not be accurate. Instead we could try to detect these situation and fail the mapping if a graph filter is used in an index analyzer. This would allow us to remove theflatten_graph
filter and also help users to not shoot themselves in the foot. Here is an hopefully exhaustive list of token filters that should be impacted by this:synonym_graph_filter
word_delimiter_graph_filter
shingles
(only whenoutput_unigram:true
ormin_size
<max_size
)cjk
(only whenoutput_unigram:true
)ngram
tokenizer whenmin_gram < max_gram
common_gram
kuromoji_tokenizer
when (nbest_cost or nbest_example > 1).