Closed Simerax closed 1 month ago
@Simerax Multiple things that are in play here -
en
analyzer over your text - it'd apply the to_lower
token filter and would the stemmer_en_snowball
would stem the words down to their root forms before indexing .. meaning for Security
, it's root word secur
will be indexed. Now to your query - it's fuzzy
is what we call a "non-analytic" query, meaning no analysis is applied to it the what you search will be looked for as is. Case is preserved, and no stemming._all
field - which I believe your index mapping automatically is doing already, so all good here.So here're some recommendations you can choose from -
standard
analyzer so no stemming is done as part of the analysis, although case will be lowered, so your search terms would all need to be lower case (since you're only using fuzziness=1
).custom
analyzer that does unicode segmentation (tokenizing) but no to_lower
token filter so you index terms as-is respecting the case.Thank you for the thorough reply!
My plan is to build a simple documentation search based on bleve. I thought the "fuzzy" query would be the best fit for endusers in a search prompt.
I think I did not understand the implications of the fuzzy query on a stemmed index.
The MatchQuery
seems to be a much better fit here.
I will play around with different combinations of analyzers and query types to see what works best.
I'm a little confused on why certain words don't match.
In particular I noticed that the word "Security" is not found in a simple fuzzy query and I don't understand why. I used the
benchmark_data.txt
as document content.Output:
As you can see when I use a fuzzy query the exact match
Security
is not matched. Even very close "fuzzy" words do not match (such asSecurit
). However it does matchsecuri
. Another random wordBurma
is matched exactly.When running a match query it matches
Security
andBurma
- as expected.I don't quite understand why the fuzzy query only matches
securi
and not the "more exact" words.Is this a Bug or am I doing something wrong?