Closed alexgarel closed 8 years ago
Maybe a "see also" https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-shingle-tokenfilter.html would be useful !
Hi @alexgarel
This docs issues list is only for issues with the docs build process. issues like this one should be opened on the elasticsearch repo instead. That said, I'm working on a rewrite of the token filters docs regardless so I'll be dealing with this issue when I get there anyway
thanks
OK @clintongormley, that's cool. And sorry for the misuse.
I think the (lack of) documentation for ngram token filter is misleading. I was expecting this filter to create ngrams of consecutive tokens, not to create ngrams of characters contained in the token.
I propose to add:
A token filter of type nGram. It creates ngrams from sequences of characters contains in each token.
We could maybe add an example as follow:
With the white space tokenizer, and a token filter with min_gram=2 and max_gram=3, "the house" will give: [th, he, the] [ho, ou, se, hou, ous, use] Note that ngrams of the same word have the same position in the phrase, so above expression would match a match_phrase query on "th ous".