Tune Lookup Fuzzyness - Githubissues

kevinslin commented 4 years ago

Context

Dendron lookup is currently quite strict. It does fuzzy search for missing characters but will turn up empty if characters are out of order.

Proposal

Create a fuzzy search toggle/slider that makes Dendron's search more fuzzy
Figure out what the optimal default fuzziness should be

References

sublime text fuzziness

hikchoi commented 3 years ago

Some issues with the fuzzy matching in lookup led me to look up (hehe) some algorithms online.

I did a bit of searching for fuzzy string matching because I had to do a lot of fuzzy matching and scoring on not-so-clean street addresses for real estate applications in the past, and I found complex algorithms were a bit of a pain to work with sometimes because there are a lot of thresholds and parameters to tweak to get what you want. After a while you realize that's not really what you want and you keep having to tweak it.

Today I stumbled upon this article : https://www.forrestthewoods.com/blog/reverse_engineering_sublime_texts_fuzzy_match/ It's a dead simple fuzzy match inspired by the one in Sublime text and has got feedback from Jon Skinner himself. Do you think it's worth looking into a simpler algorithm for fuzzy string matching on lookup? A quick look into the code it seems we are using fusejs, which uses a modified bitap. I remember quickly giving up on using this(bitap, not fusejs) because it was so fiddly for my use case.

Maybe it's not working so well with our hierachy? Just a thought.

Orcomp commented 3 years ago

There are a few assumptions I would like to establish before going too deep.

fusejs is used in lookup only
lookup is used to search for filenames and schemas

My biggest issue with fusejs is that it can't match an exact word inside a long hierarchy. (i.e. it doesn't work for a very simple use case.)

Depending on what we are trying to achieve, we could use a simple solution that splits the search into multiple sub-search strategies.

Split a hierarchy on the dot to obtain multiple words
Look for exact matches starting from the beginning of a word
Look for exact matches anywhere inside a word
Split the search term on each character to see if it matches the first character of any of the words

This would be fuzzy enough for 99% of cases. Would be accurate and could be implemented quickly using regex. (I would be more cautious if we had a lot of words to search. I would expect this to work well up to 500_000 words.)

aleksey-rowan commented 2 years ago

Please check the fuzzThreshold configuration option for optimizing fuzziness of the lookup.

dendronhq / dendron

Tune Lookup Fuzzyness #136

Context

Proposal

References