eklem / search-index-cookbook

A collection of recipes and how to's on interesting use cases with search-index
MIT License
3 stars 5 forks source link

Fuzzy search with Levenshtein distance #49

Open eklem opened 4 years ago

eklem commented 4 years ago

https://en.wikipedia.org/wiki/Levenshtein_distance

Could use leven. First, when indexing, make a separate array of all words used (or see if any way possible to get this from search-index). Then, when searching, do a Levenshtein distance on 1 or 2 and do an OR-search on words you get back.

eklem commented 4 years ago

The word array could just be stored in a file or in indexedDB and read to memory when page is loaded.

Just do the indexedDB example. What's needed:

Do this after stopwords are removed, so less processing. Show a slider for fuzzy search on/off with the extra words searched on?

eklem commented 4 years ago

So, search-index got all I need.

Did this in the browser console in the demo:

db.DICTIONARY().then(resultsDir)

And got all the words in the index back out again.

Screenshot 2020-05-14 at 20 11 36

This means you do this first time every time the app/page loads and populate an array. It will be in memory until the page is reloaded, then you need to do the DICTIONARY stuff once more. So check if exists, if not do it.

Then loop through each word in the query and do a Levenshtein if-check for each word in the dictionary.

Should make sure all words in the index are lowercase so the dictionary array is not that big.