dendronhq / dendron

The personal knowledge management (PKM) tool that grows as you do!
https://wiki.dendron.so
Apache License 2.0
6.69k stars 252 forks source link

Tune Lookup Fuzzyness #136

Closed kevinslin closed 2 years ago

kevinslin commented 4 years ago

Context

Dendron lookup is currently quite strict. It does fuzzy search for missing characters but will turn up empty if characters are out of order.

Proposal

  1. Create a fuzzy search toggle/slider that makes Dendron's search more fuzzy
  2. Figure out what the optimal default fuzziness should be

References

hikchoi commented 3 years ago

Some issues with the fuzzy matching in lookup led me to look up (hehe) some algorithms online.

I did a bit of searching for fuzzy string matching because I had to do a lot of fuzzy matching and scoring on not-so-clean street addresses for real estate applications in the past, and I found complex algorithms were a bit of a pain to work with sometimes because there are a lot of thresholds and parameters to tweak to get what you want. After a while you realize that's not really what you want and you keep having to tweak it.

Today I stumbled upon this article : https://www.forrestthewoods.com/blog/reverse_engineering_sublime_texts_fuzzy_match/ It's a dead simple fuzzy match inspired by the one in Sublime text and has got feedback from Jon Skinner himself. Do you think it's worth looking into a simpler algorithm for fuzzy string matching on lookup? A quick look into the code it seems we are using fusejs, which uses a modified bitap. I remember quickly giving up on using this(bitap, not fusejs) because it was so fiddly for my use case.

Maybe it's not working so well with our hierachy? Just a thought.

Orcomp commented 3 years ago

There are a few assumptions I would like to establish before going too deep.

My biggest issue with fusejs is that it can't match an exact word inside a long hierarchy. (i.e. it doesn't work for a very simple use case.)

Depending on what we are trying to achieve, we could use a simple solution that splits the search into multiple sub-search strategies.

This would be fuzzy enough for 99% of cases. Would be accurate and could be implemented quickly using regex. (I would be more cautious if we had a lot of words to search. I would expect this to work well up to 500_000 words.)

aleksey-rowan commented 2 years ago

Please check the fuzzThreshold configuration option for optimizing fuzziness of the lookup.