episerver / EPiServer.Labs.Find.Toolbox

Find Toolbox features an improved synonym implementation, MinimumShouldMatch, MatchPhrase, MatchPrefixPhrase, FuzzyQuery, WildcardQuery and custom CMS search providers
Apache License 2.0
8 stars 2 forks source link

WildcardMatch & FuzzyMatch strange behavior #14

Open soria2020 opened 8 months ago

soria2020 commented 8 months ago

I'm experiencing an issue with Episerver.Labs.Find.Toolbox (2.0.2), particularly with the FuzzyMatch() function in my search queries. During testing, I've noticed inconsistent results:

  1. When I search for "sjuksköterskeutbil", it returns no results, even though there is a program named "Sjuksköterskeutbildning."
  2. Searching for "förskol" returns 50 hits, "förskollärar" returns 33 hits, and "förskollärare" returns 55 hits. Why are there more results for "förskollärare" instead of fewer, given that it's a more specific query?

I can replicate the behavior with my default index.

This is how the search query look like, i am using both WildcardMatch and FuzzyMatch. query = query.For(queryString).WildcardMatch(p => p.Title).FuzzyMatch(p => p.Title);

EPiServer.Labs.Find.Toolbox 2.0.2 EPiServer.Find.Cms 16.0.2 EPiServer.CMS 12.27.1

dada81 commented 8 months ago

Hi @soria2020

  1. My Fuzzy and wildcard implementation ignore terms that are longer than 16 characters. I will be adjusting this in next release.

  2. It's difficult to say without deeper investigation but there are many things happening here.

    • The free-text search (setting Wildcard and Fuzzy a side) might be matching on the word stem (using the stemmer) which might give you hits that you might not expect. Förskola might match Förskolan, Förskollärare might match förskollärarna.
    • FuzzyMatch might give Förskola as a candidate when searching for Förskol

    Anyway, it's not unexpected even though it might be confusing :) I suggest you try with one search with only FuzzyMatch and one search with only WildcardMatch and one without any of them to understand what they produce on their own.