Sorting search results - Githubissues

corbanbrook commented 10 years ago

Multiple schemes can be employed to achieve results which are most relevant to the user. Predicting which file a user wants out of a long list of possible matches and presenting it first can help speed up development time/maintain flow and train of thought. Here are some ideas to discuss:

[ ] Sort least fuzzy results to the top. Fuzziness determined by search term run length in the result. ie. Search term of 'src' would score a higher fuzziness score on a file like .jshintrc than a file called app_src.js or files in the src/ directory. run length of first file is 2 while the second file has a run length of 3.
[ ] Filename match or directory match. A match on a filename should be sorted at a higher priority than a match on a full path, but what about matches which higher run length on a full path vs less run length on a filename. ie. search of 'src ap' currently displays images/destroy_discard.png higher than src/app.coffee. Although destroy_discard technically has more matched character within the filename than src/app.coffee src/app has more run length. (something to think about)
[ ] Dot files. Hidden files are less important than non hidden of equal score.
[ ] Ignored files (.gitignore) can be sorted at lower priority than files with equal score. ie. I want the index.html in my templates/ dir. not my build/ dir because this is in .gitignore. Editing temporary file by mistake can add confusion and lost work.
[ ] Sort files with most recent last modified date at a higher priority than those of equal score.
[ ] Filter out/sort to bottom files that are already open.

hkdobrev commented 10 years ago

@corbanbrook AFAIK this issue should be filed against the fuzzaldrin library which is used by fuzzy-finder to filter and score results.

corbanbrook commented 10 years ago

fuzzaldrin simply does scoring and sorting of arrays of strings or objects. Some of my above recommendations might be outside the scope of the project.

One solution would be for the fuzzaldrin to add option for custom filter/sorting callbacks. Another solution would be to simply use fuzzaldrin to provide the initial score to use in further sorting schemes within this project like dot file priority, ignored file priority, and last modified priority.

jamby commented 10 years ago

Would also be nice if the fuzzy finder could have the results filtered in terms of importance for type of project. For instance, a Ruby on Rails project, if I start typing a model's name, have the first result usually be '/app/models/model_name.rb', instead of having the first result be 'spec/models/model_name_spec.rb'.

Most times I want to deal with the model, not the spec.

miletbaker commented 10 years ago

It would be nice as well to have more recency to the find logic, although it will mainly (it does seem intermittent especially of you switch to another app and back again) suggest the last file accessed to allow quick switching between files, it would be good if it always gave precedence on the file based on last access allowing to easily work between several files.

A good example of where this works well is Textmate's implementation of cmd-t find file. The sorting there works well.

dmnd commented 10 years ago

Here's an example where the order isn't great. The second result is what I want, and it's a much closer match, so I don't know why it's second.

dmnd commented 10 years ago

Even worse:

I wanted the last result in this instance.

(I hope these examples are useful, apologies if they're noise)

dmnd commented 10 years ago

Another one

adammw commented 10 years ago

Coming from ST3, the fuzzy matcher really drives me crazy that it lists the specs before the actual controllers I want.

fuzzy

Is there any config which changes how the fuzzy finder works, or do we need to improve the underlying fuzzy finding library to improve the searching?

lewispb commented 10 years ago

+1 for this

davepeck commented 10 years ago

I decided to play with Atom for the first time this weekend; I immediately found myself frustrated with the strange fuzzy ordering in Atom's select list views.

If we're going to improve fuzzy matching in Atom, there are lots of things to consider:

The "right" ordering is fundamentally subjective. It's clear from github issues and Atom forums that lots of people would like to see improvement, but it's equally clear that we won't ever fully agree on what's "best." At the very least, changes to fuzzaldrin should continue to respect the current scoring tests; these represent the only codified community judgment we have so far. We'll probably want to augment these tests with examples from real-world projects, too.
The "right" ordering is probably context dependent. There's a hint of this in fuzzaldrin's filter method, which takes the strange queryHasSlashes parameter and invokes the specialized scorer.basenameScore depending. I'd expect any update to filter (a) will need to be parameterizable by the caller — for example, to indicate separators, weights, etc. and (b) will need sensible defaults so it can be invoked without more than the needle and haystack. As an example, we might want path separators to have importance when invoking filter with a list of file names, but we might want the colon-space (:) to have importance when invoking filter from the command palette.
There's great prior art to learn from. TextMate's ranking algorithm is highly regarded, although at first glance I find the implementation hard to grok. (It seems to have a dynamic programming component in matrix but lacks essentially any useful comments.) Command-T also has a well-liked algorithm. Gary Bernhardt's selecta ranking algorithm was based on some interesting discussion that considered this prior art.
Especially in fuzzy-finder's case, there's a lot of metadata we can and probably should use to improve ranking. The venerable PeepOpen ranking algorithm takes into account file modification times, last opened, git status, etc. Probably this more sophisticated ranking belongs strictly in fuzzy-finder, as a new "meta scoring" layer; fuzzaldrin should continue to just be about ranking a needle in a haystack of strings.
My smartscore branch tries to codify some basic intuitions about what makes a match "better". These include: touching the "starts of words" counts for more; some separators are worth more than others (in file contexts, '/' is probably worth more than '-' or ' '); on the whole, we should prefer fewer contiguous runs of longer length; full word matches along the way are always preferable; etc.
Performance is a consideration. Right now every call to filter starts fresh. But it seems to me that (a) it may prove desirable to pre-process each string in the haystack before ever invoking filter, and (b) if the user is simply appending characters to the query string, it might (?) be possible to iteratively re-score the results.

Alright — hopefully this is useful/interesting to someone. I plan to slowly work on improvements to both fuzzaldrin and fuzzy-finder in my personal branches. Suggestions and feedback are most welcome!

(For fun, I started by replacing fuzzaldrin's current score method with a coffeescript re-implementation of TextMate 2's ranking algorithm; it works and, after a minor tweak, passes all fuzzaldrin tests.)

nj commented 9 years ago

:+1: as the current solution is rather useless - and can even be faster to find the file manually

matugm commented 9 years ago

:+1: Would be great if we could get some progress on this.

walles commented 9 years ago

Improved sorting / scoring: https://github.com/atom/fuzzaldrin/pull/22

The above pull request addresses at least some of the issues raised here.

ghost commented 9 years ago

Since I'm working on a lot of Rails projects with ActiveAdmin, I'm often annoyed when I end up in an ActiveAdmin file for a particular resource instead of a model file.

I was thinking about improving this by sorting the fuzzy-finder results by usage. I.e. if a files in some folder are worked on more often, they are ranked higher.

I'm happy to implement this experimentally and make a pull request if other people approve of this idea also.

Soleone commented 9 years ago

:+1: for some improvements that make finding commonly used files easier. sublime seemed to have done a better job putting the file i actually want to open at the top (using rails here as well)

dmnd commented 9 years ago

Just in case further examples are helpful:

kevinsimper commented 9 years ago

@jeancroy Did #22 solve this issue?

jeancroy commented 9 years ago

There's now an "use Alternate Scoring" option in fuzzy finder that use it. It address many issue about the search by file name / path.

But it does not cover any knowledge about the file themselves, such as preference for recent / frequent / certain files.

r-owen commented 9 years ago

I just tried the Atom Beta with "Use Alternate Scoring" enabled and it's a huge improvement, though still not as good as Sublime Text. I have a project with a huge number of files, including Doxygen generated html files that I rarely want to look at. I tried to find a file named "matchOptimisticB.h". In SublimeText I can type "mob.h" and get the right file as the first suggestion. In Atom Beta it is the ninth choice, preceded by eight html files I have no interest in.

One thing that might help Atom: if the user provides a file type suffix, prefer names that match that suffix exactly over names that use that suffix as a prefix.

Another thing that might help (though I really hope it won't come to this, and it's not needed by Sublime) is to allow the user to disable directory patterns. In my case I might eliminate searches of Doxygen-generated html files and would definitely elimiate .os files (why in the world is it showing binary libraries?).

jeancroy commented 9 years ago

Ok please open an issue on fuzzaldrin-plus I can give it a look. I'd need the result that come before to understand why they are preferred. Also full path is useful, if private, a mock-up with same length and directory depth.

On Thu, Dec 3, 2015, 13:51 Russell Owen notifications@github.com wrote:

I just tried the Atom Beta with "Use Alternate Scoring" enabled and it's a huge improvement, though still not as good as Sublime Text. I have a project with a huge number of files, including Doxygen generated html files that I rarely want to look at. I tried to find a file named "matchOptimisticB.h". In SublimeText I can type "mob.h" and get the right file as the first suggestion. In Atom Beta it is the ninth choice, preceded by eight html files I have no interest in.

One thing that might help Atom: if the user provides a file type suffix, prefer names that match that suffix exactly over names that use that suffix as a prefix.

Another thing that might help (though I really hope it won't come to this, and it's not needed by Sublime) is to allow the user to disable directory patterns. In my case I might eliminate searches of Doxygen-generated html files and would definitely elimiate .os files (why in the world is it showing binary libraries?).

— Reply to this email directly or view it on GitHub https://github.com/atom/fuzzy-finder/issues/21#issuecomment-161745601.

r-owen commented 9 years ago

I just submitted this issue. I hope it helps.

https://github.com/jeancroy/fuzzaldrin-plus/issues/12

Thank you very much for trying to improve Atom’s fuzzy search.

— Russell

On Dec 3, 2015, at 10:56 AM, Jean Christophe Roy notifications@github.com wrote:

Ok please open an issue on fuzzaldrin-plus I can give it a look. I'd need the result that come before to understand why they are preferred. Also full path is useful, if private, a mock-up with same length and directory depth.

On Thu, Dec 3, 2015, 13:51 Russell Owen notifications@github.com wrote:

I just tried the Atom Beta with "Use Alternate Scoring" enabled and it's a huge improvement, though still not as good as Sublime Text. I have a project with a huge number of files, including Doxygen generated html files that I rarely want to look at. I tried to find a file named "matchOptimisticB.h". In SublimeText I can type "mob.h" and get the right file as the first suggestion. In Atom Beta it is the ninth choice, preceded by eight html files I have no interest in.

One thing that might help Atom: if the user provides a file type suffix, prefer names that match that suffix exactly over names that use that suffix as a prefix.

Another thing that might help (though I really hope it won't come to this, and it's not needed by Sublime) is to allow the user to disable directory patterns. In my case I might eliminate searches of Doxygen-generated html files and would definitely elimiate .os files (why in the world is it showing binary libraries?).

— Reply to this email directly or view it on GitHub https://github.com/atom/fuzzy-finder/issues/21#issuecomment-161745601.

— Reply to this email directly or view it on GitHub.

tnrich commented 8 years ago

Does anyone know if there is an equivalent issue open discussing the Cmd-Shift-P search algorithm?

jeancroy commented 8 years ago

you're speaking of command palette? Should already be integrated. If you have a problem you can try openings a issue on fuzzaldrin-plus repo

On Tue, May 3, 2016, 20:22 Thomas Rich notifications@github.com wrote:

Does anyone know if there is an equivalent issue open discussing the Cmd-Shift-P search algorithm?

— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/atom/fuzzy-finder/issues/21#issuecomment-216706212

adamreisnz commented 7 years ago

This is a pretty ancient issue, so I have little hope of improvement arriving any time soon, but here's my two cents:

Two things wrong with the way the fuzzy finder currently works both illustrated with the above example.

1) Sort order (as already pointed out in this issue): based on my input, I would really expect the file member/edit/edit.html to show up on top. It does for some reason when I remove the l from html:

But with the full html it suddenly drops to second place which gets rather infuriating after having opened the wrong file several times.

So scoring should somehow take into account how close the search terms are to each other in the filename, and prioritize edit.html over edit-payment.html unless I include payment in my search query.

2) It should really prioritize full word matches rather than scattered letters. If you look at the above example, it actually matches member because it's in the app/components/admin/member folder, instead of simply matching to the member part of the path, because that's a whole matching word.

These two tweaks would make the search algorithm a lot stronger.

jeancroy commented 7 years ago

hi @adamreisnz , as a curiosity, is this happening with alternate scoring turned on ? There was a strong preference for word "togetherness" in that version.

From screenshot I'm guessing it's not, but if it is I'll add a few of those to test benchmark. Previous algorithm would take first occurrence of m then first e then first m instead of waiting and trying for member

adamreisnz commented 7 years ago

@jeancroy thanks for looking into it, but yes, it's in fact enabled:

The version I'm using is 1.18.0-dev-f4a83b238

jeancroy commented 7 years ago

Another possibility is that alternate score is used for ranking while classic is used for highlighting. The whole component below fuzzy finder has been rewritten recently. If that's the case the whole scattered letter is a false trail.

One feature of the new one is a bias toward file name (vs whole path) when we match file extension exactly I think you are batling against that when you are using keyword from the path but end with extensions

To sum up your request, you want the htm behavior to happens even in html case ? I'm not sure what the algorithm does because of how scambled the higligth is.

adamreisnz commented 7 years ago

Well, my use case as you might deduce from my example is that in a large project, there will be many components. Each component might have an edit sub component as in the example, and each of those components will have edit.html template, and edit.js module, and perhaps edit.ctrl.js controller.

So the way I tend to quickly open the file I want, is by specifying the parent component member, then the sub component edit and then extension if I know there's going to be more than one file.

This usually works fine, but in the above case it was messing it up due to the existence of another similar file in the same path (edit-payment.html).

I think my use case is fairly common, so I wouldn't expect to be "battling" against the fuzzy finder's system with it.

edit.html should still be preferred over edit-payment.html if you search for "edit html" imo, on account of it being the shorter and closer match.

jeancroy commented 7 years ago

You're right on all account, in this case it seems the algorithm just like the m of payment. I guess the m of html manage to count twice, I'll see how to fix that.

Good news is that the issue is more constrained than say lack of prioritizing "full word matches". (Here- count as a word boundary)

untitled

I'll open a different issue for highlight regression it should group member appropriately

adamreisnz commented 7 years ago

Yeah that looks better in your screenshot, highlighting member properly. And interesting that it likes the m in payment and paykent is put at the bottom properly. Looks like it's just a few tweaks needed to fix those issues then 👍

adamreisnz commented 7 years ago

Looks like in the latest version (just built Atom from master yesterday) there's still some scoring issues. For example this result:

It should not prioritise cards/club-details.js over cards/details.js for the same reason as above, where it shouldn't prioritise the edit-payment.html file. cards/details.js is a closer match, because it has fewer non-matching characters between the matches.

I did not type a c character and it already matched card, so it's a bit baffling why it tries to mark the c of club and give that result a higher score than the more sensible result below that.

Note that when I type cards it does prioritise correctly (but still marks the c in the second result):

I think once a search term has been used/matched in the path, it should not try to match it again for another part of the path. In addition, results with the least amount of non-matching characters between the matches should probably score highest.

adamreisnz commented 7 years ago

Another example in Atom 1.20 dev where prioritisation is not what one would expect;

adamreisnz commented 7 years ago

Guys, any activity on this issue please? It's infuriating to keep opening the wrong files because the fuzzy finder sorting logic is off.

VSCode manages to do it correctly, why not Atom? Perhaps it would be worthwhile looking at their algorithm.

winstliu commented 7 years ago

@adamreisnz looks like this was fixed a month ago by @jeancroy but we're running an outdated version of fuzzaldrin-plus. Will create a PR.

atom / fuzzy-finder

Sorting search results #21