atom / fuzzaldrin

Fuzzy filtering and string scoring
MIT License
319 stars 28 forks source link

Consider more advanced string scoring #5

Closed hkdobrev closed 10 years ago

hkdobrev commented 10 years ago

Here is a simple test taken from reverse-engineering the Sublime Text matching:

expect(score('foobar/138_abc_zyx', 'az')).toBeLessThan(score('foobar/lolololol/abc_zyx', 'az'))

The point here is very logical. Even if the abbreviation is matched later in the whole string, it would have a bigger score if the abbreviation is matched earlier in the basename.

Unfortunately stringScore() would not pass this test.

Scores for az:

Sublime Atom
foobar/138_abc_zyx 145 0.11111111111111112
foobar/lolololol/abc_zyx 151 0.10833333333333334

You could create a PR with the test here if needed: https://github.com/hkdobrev/fuzzaldrin/compare/atom:master...hkdobrev:failing-test-for-basename-scoring


I hope this issue is one of many I would be able to find and define from Sublime Text implementation. The Go To File feature is a killer. It really surprises you how well it's implemented even after months and months of usage. Atom should really make enhancing the Fuzzy Finder a key priority.

malmckay commented 10 years ago

+1

My example is this:

expect(score('lib/exportable.rb', 'table')).toBeLessThan(score('app/models/table.rb', 'table'))
corbanbrook commented 10 years ago

I believe that fuzzaldrin needs to decide if it does string matching or filepath matching. Perhaps it needs a filepath mode which uses different sorting behaviour.

I agree with the above example that table.rb should be scored higher than exporttable.rb even tho table.rb is in a deeper path. A filepath mode could use concepts like path depth, directory match vs filename match, and directory/filename starting with search term vs contained within to help scoring.

kevinsawicki commented 10 years ago

@malmckay The scoring API is different than the filtering API, filtering takes into account basename matches so your example would pass a spec where app/models/table.rb is sorted highest.

I added it to the specs in commit 0e4d74d72e2d5174d567f7be12810d7aa73f8c30

kevinsawicki commented 10 years ago

@hkdobrev I tweaked the scoring to weigh matches after a slash higher than matches after a space, underscore, oer dash. This allows basename matches to now score higher when the match is directly at the beginning of the basename.

This library still has a long way to go I'm sure but it would be great if you could report other cases you find incorrect so we can start building up the spec suite and adjust the scoring carefully.

Thanks again for the detailed report.

hkdobrev commented 10 years ago

Thanks @kevinsawicki!