Closed kevinslin closed 2 years ago
Some issues with the fuzzy matching in lookup led me to look up (hehe) some algorithms online.
I did a bit of searching for fuzzy string matching because I had to do a lot of fuzzy matching and scoring on not-so-clean street addresses for real estate applications in the past, and I found complex algorithms were a bit of a pain to work with sometimes because there are a lot of thresholds and parameters to tweak to get what you want. After a while you realize that's not really what you want and you keep having to tweak it.
Today I stumbled upon this article : https://www.forrestthewoods.com/blog/reverse_engineering_sublime_texts_fuzzy_match/ It's a dead simple fuzzy match inspired by the one in Sublime text and has got feedback from Jon Skinner himself. Do you think it's worth looking into a simpler algorithm for fuzzy string matching on lookup? A quick look into the code it seems we are using fusejs, which uses a modified bitap. I remember quickly giving up on using this(bitap, not fusejs) because it was so fiddly for my use case.
Maybe it's not working so well with our hierachy? Just a thought.
There are a few assumptions I would like to establish before going too deep.
My biggest issue with fusejs is that it can't match an exact word inside a long hierarchy. (i.e. it doesn't work for a very simple use case.)
Depending on what we are trying to achieve, we could use a simple solution that splits the search into multiple sub-search strategies.
This would be fuzzy enough for 99% of cases. Would be accurate and could be implemented quickly using regex. (I would be more cautious if we had a lot of words to search. I would expect this to work well up to 500_000 words.)
Please check the fuzzThreshold
configuration option for optimizing fuzziness of the lookup.
Context
Dendron lookup is currently quite strict. It does fuzzy search for missing characters but will turn up empty if characters are out of order.
Proposal
References