Closed dmfay closed 3 years ago
Thanks! I'll play with this shortly :)
Sorry for the delay :)
This seems plenty fast enough. It'd be really nice if it could prioritize longer matches somehow, though. ...I'm not sure how best to do that using the %
approach.
No worries, there's a lot happening right now 😬
I'll look at fixing up the highlighting soon but here's how you'd prioritize longer matches in general, for results having the same rank
:
diff --git a/src/history/history.rs b/src/history/history.rs
index 6c2eca1..cea1d2f 100644
--- a/src/history/history.rs
+++ b/src/history/history.rs
@@ -247,7 +247,8 @@ impl History {
selected_occurrences_factor, occurrences_factor
FROM contextual_commands
WHERE cmd LIKE (:like)
- ORDER BY rank DESC LIMIT :limit";
+ ORDER BY rank DESC, length(cmd) DESC
+ LIMIT :limit";
let mut statement = self
.connection
.prepare(query)
However, I don't think it's a good idea. Especially with fuzzy searching, shorter matches get filtered out naturally as you continue to type, so prioritizing longer matches makes the shorter all but unreachable by fuzzy searching alone.
$ cruns
cargo run -- search --fuzzy
cargo run -- search
If anything, I'd want the opposite, so cargo run -- search
would be the top result until you entered an f or a z.
Agreed, that shorter commands (or longer contiguous substring matches?) would be better. Since the rank from Mcfly is a float, though, I think the length would almost never be used in your implementation.
On Thu, Oct 15, 2020 at 6:11 AM Dian Fay notifications@github.com wrote:
No worries, there's a lot happening right now 😬
I'll look at fixing up the highlighting soon but here's how you'd prioritize longer matches in general, for results having the same rank:
diff --git a/src/history/history.rs b/src/history/history.rs
index 6c2eca1..cea1d2f 100644
--- a/src/history/history.rs
+++ b/src/history/history.rs
@@ -247,7 +247,8 @@ impl History {
selected_occurrences_factor, occurrences_factor FROM contextual_commands WHERE cmd LIKE (:like)
ORDER BY rank DESC LIMIT :limit";
ORDER BY rank DESC, length(cmd) DESC
LIMIT :limit";
let mut statement = self .connection .prepare(query)
However, I don't think it's a good idea. Especially with fuzzy searching, shorter matches get filtered out naturally as you continue to type, so prioritizing longer matches makes the shorter all but unreachable by fuzzy searching alone.
$ cruns
cargo run -- search --fuzzy
cargo run -- search
If anything, I'd want the opposite, so cargo run -- search would be the top result until you entered an f or a z.
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/cantino/mcfly/pull/103#issuecomment-709314702, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAUO623Y5MWFJIGV64YPC3SK3YJHANCNFSM4R5DKHVA .
Serves me right for not checking; you could order by rank rounded to int, then length (ascending), then the full rank, but that's getting a bit kludgy.
Weighting contiguous characters higher would be a good next step after basic fuzzy searching. That's beyond the capabilities of SQLite so it'd have to happen after results are returned. Results near the bottom could have worse quality than those that didn't get in under the LIMIT
too, but that's not much of a problem in practice.
I'd be fine merging this as-is if we could fix highlighting, and then we can always work on improving the sorting.
This implementation could be smartened up a little bit -- right now clr
matches a[cceler]ate
instead of ac[celer]ate
-- but it works, it's correct, and if moved a bit earlier end - start
is exactly what we need to weight closer/contiguous matches.
Fuzzy matches are now weighted by length, so a lower-ranked but shorter match has a chance to come in above a higher-ranked but longer match. For example, with the search text sshmyk
and these two matches:
[ssh -i ~/.ssh/my_k]ey user@host
[Some gigantically long String wHich has already Matched 'ssh' but will not complete the match until a Y and a K] are found
The difference in the two match lengths (18 vs 111 characters) is factored into the weight calculation. Here the real ssh command gets a rank bump of +0.86 while the longer string's rank is increased only by +0.14, so the ssh command can overcome an unweighted rank disparity of up to 0.72.
If the longer match were instead, say, 30 characters, the ssh command would be ranked at +0.63 while the longer string would add +0.38. The ssh command's unweighted rank would have to be within 0.25 of the longer string to jump ahead.
I think this is a very sensible approach. Ideally, it'd be nice to empirically derive some sort of weighting for fuzzy matches that takes into account how long the character runs are, how many different sections it's matching, etc., but to do this, I think we'd need a bunch of training data that we don't have. For the time being, this seems fine if it's working well for you in practice.
I'll install this locally and start using it in fuzzy mode as well.
The code looks good. Would you mind running cargo clippy
and cargo fmt
on it, though, I think there are a couple style differences.
Updated with formatting. I've been using it locally and am pretty happy with it; there's definitely room for further improvements but it's already quite useful.
Been working for me! Thanks for the contribution :)
Released in v0.5.1 :)
A quarter-second faster than the skim version at ~100k entries; the one thing this needs is highlight detection, which still only looks for a contiguous match.