Fix scoring to favor exact substring matches

atom / fuzzaldrin

Fuzzy filtering and string scoring

MIT License

319 stars 28 forks source link

Fix scoring to favor exact substring matches #15

Closed brandonwamboldt closed 9 years ago

brandonwamboldt commented 9 years ago

So I made pretty radical changes to scoring and don't expect this to be accepted without tweaks, but I figured it would be a good way to get momentum on the problem.

We have an issue with exact substring matches being scored lower than non substring matches. My go to example is:

Find And Replace: Select All Settings View: Install Packages And Themes

That's the sorted order when searching for "Install", clearly not what you'd expect. This changes the scoring system to heavily weight substring matches, so if you type a word exactly as it appears in the string, it will be weighted highly (nearly as a high as an exact match).

Should address https://github.com/atom/atom/issues/6461 and https://github.com/atom/command-palette/issues/28.

brandonwamboldt commented 9 years ago

FYI I force pushed a better version just now. My old way didn't work consistently, now I add a value to the finalized score for substring matches (like we do for basename matches), which yields more accurate results.

brandonwamboldt commented 9 years ago

Also what's the etiquette on corrections? I just added a second commit to be safe, as if the person comments on a commit and not the PR diff, comments are lost when you force push, but you also don't want to merge in a PR full of unnecessary commits.

Just curious what you guys prefer? Or do you just squash & merge manually?

winstliu commented 9 years ago

@brandonwamboldt In most cases, the Atom team doesn't care too much about how many commits are in a PR when merging. A few good examples of this are the recent pane-resize PR (24 commits) and the gutter API PR (53 commits). Your choice in the end though :).

@mnquintana Generally it seems like weighs is to measure something while weights is to hold something down, so I'd lean towards weighs here.

brandonwamboldt commented 9 years ago

Roger that, thanks.

Also, to explain why I added 1 to the weight, which is 1 or lower at this point in the code, is that I think substring matches should always be ranked higher than non substring matches, and adding 1 is the only way to accomplish that. Now, substring matches are ranged from 1.0 to 2.0, and non substring matches are ranged from 0.0 to 1.0.

However, I'm also adding a value if:

The substring match is at the beginning of the string (highest bonus)
The substring match is right after a path separator (second highest)

mnquintana commented 9 years ago

@50Wliu: Generally it seems like weighs is to measure something while weights is to hold something down, so I'd lean towards weighs here.

From the Apple dictionary:

weight - verb [ with obj. ] 2) attach importance or value to

I've definitely seen weight used in this way to refer to search rankings before.

thomasjo commented 9 years ago

It is definitely correct to use weight (and hence weights) in this context.

mnquintana commented 9 years ago

/cc @atom/feedback

walles commented 9 years ago

Could you add a test case for your real-world example that you mention in the initial comment? With "Settings View: Install Packages And Themes" being at place ten when searching for "install". Preferably with all intermediate lines in it as well.

That's a good way to know that the patch addresses the right problem.

walles commented 9 years ago

Given that PR #22 has test cases for some nice things that this PR does not, I'd prefer #22 being merged over this one.

Or that this PR is updated with the tests for git push and installfrom #22.