atom / fuzzaldrin

Fuzzy filtering and string scoring
MIT License
319 stars 28 forks source link

Improve fuzzy finder (long term solution) #23

Open plrenaudin opened 9 years ago

plrenaudin commented 9 years ago

We have a quite nice PR #22 which is improving the already existing state of the fuzzy finder. But it doesn't feel as pertinent and snappy as the Chrome dev tools or Sublime Text one. Maybe a good starting point for the next step after this PR would be to try to implement the same behaviour as the Chrome dev tools one. Thankfully it's open source so we can check the code here: https://chromium.googlesource.com/chromium/blink/+/master/Source/devtools/front_end/sources

Especially in those files:

FilePathScoreFunction.js
FilteredItemSelectionDialog.js

Any suggestion / objections?

walles commented 9 years ago

Just an hour ago the PR #22 performance improved by a factor of 2x, that might help.

Just wanted to make sure you've tested with the improved version.

plrenaudin commented 9 years ago

I pulled the changes yes. Don't get me wrong, it's a great PR and it improves this packages a lot already. This issue is just to start the discussion about the long term solution.

jeancroy commented 9 years ago

it is a O(m_n) algorithm, it is building a m_n dynamic programming table inside a single vector _singleCharScore: function(query, data, i, j) really looks like our scoreChar function.

I don't expect much difference not accountable to tuning of the weigth

walles commented 9 years ago

There are things that can be done outside of this project; sometimes responsiveness can be faked.

When I just test my Atom right now, I get the feeling that it re-computes the list every time I press a key. Waiting 200ms (or whatever) before recomputing the list could make typing snappier. And given that people don't have time to look at the list while typing fast enough, it probably wouldn't affect usability negatively either. But that would be outside of fuzzaldrin.

Another thing that could be done would be to have different algorithms when we have a lot of hits and just a few of them. A fast (but not as good) algorithm would be to simply sort hits by length (primarily) and alphabetically (secondarily); if that was done when there are lots of matches that would improve things as well.

jeancroy commented 9 years ago

benchmark is 66k elements. What is normal bank size ? Indeed a debounce will get a long way to prevent computing something that will get immediately discarded

plrenaudin commented 9 years ago

I'm testing it on a 35k files project and I can say that it's really quick already! Maybe there is no need for a "long term solution" after all :+1:

jeancroy commented 9 years ago

Because we do search-as-you-type and we allow no typo, one thing that is possible is to get matches for query "xyz" by filtering only matches for query "xy".

But thank you for finding that project, it's likely there will be a process of tuning weigths

jeancroy commented 9 years ago

There's one thing to note about speed: right now it assume most of the entries are not what you are looking for. Eg benchmark for "node" instead of "index" is 6x slower. So one improvement could be for the exact bonus to be a bypass rather than a bonus at the end.

Edit: Fixed this scenario

plrenaudin commented 9 years ago

The PR #22 is good enough now, will close this issue as soon as it is merged.

jeancroy commented 9 years ago

With that being said, I've found that their approach (bonus that grow with number of consecutive characters in sequence of FilePathScoreFunction.js) worked better than gap open penalties to produce intuitive matches. So i'm now using something built on that idea :)

It's good to see alternatives solutions to a problem.

plrenaudin commented 9 years ago

That's brilliant, good job @jeancroy ! Really look forward to the merge of this PR...