elastic / elasticsearch

Free and Open Source, Distributed, RESTful Search Engine
https://www.elastic.co/products/elasticsearch
Other
1.09k stars 24.84k forks source link

Fail index analyzer that contains a graph token filter #24396

Open jimczi opened 7 years ago

jimczi commented 7 years ago

Currently it is possible to set a synonym_graph or a word_delimiter_graph token filter in an analyzer that is used at index time. Though these filters can produce side-paths that will break the positions in the index and make phrase query matching impossible on the field. The flatten_graph token filter is supposed to handle this situation but it can only flatten the graph which is also a lossy operation. So whether the user adds a flatten_graph filter at the end of the analyzer or not the positions of the terms in the index will not be accurate. Instead we could try to detect these situation and fail the mapping if a graph filter is used in an index analyzer. This would allow us to remove the flatten_graph filter and also help users to not shoot themselves in the foot. Here is an hopefully exhaustive list of token filters that should be impacted by this:

romseygeek commented 6 years ago

I've been thinking about ways of doing this for keyword normalizers, one way would be to have a specialised AttributeSource that throws an exception when a tokenfilter adds a PositionLengthAttribute.

cc @elastic/es-search-aggs

romseygeek commented 5 years ago

This is related to work that @cbuescher is doing regarding index vs query-time analysis chains.

elasticsearchmachine commented 4 months ago

Pinging @elastic/es-search-relevance (Team:Search Relevance)