Give preference to exact matches

vspinu commented 6 years ago

I have my own application named e which starts emacs built from source, but I commonly use built in emacs 25 with quite a mouthful name "GNU emacs 25 (GUI)". If I type "e" I do get the latter as the default completion because I have used it more often so far. I believe this is unnatural. FWIW, krunner does give priority to the exact matches.

Operating system Kubuntu 18.04
Desktop environment KDE
Albert version From source master

ManuelSchneid3r commented 6 years ago

Whats the point in giving items that you use less often prio? Effort = frequenzy / prio. Why would you want to increase the effort to get what you want? The whole point of this app is self optimization.

vspinu commented 6 years ago

It depends on how you define effort. If I can get the long item with "em" and short exact match with "e" both of which are effortless, but if I always get the long term no matter what I input and I have to use arrows to navigate to the shorter item then that's "effort".

This behavior seems natural to me but I might be biased in this regard. All text completion frameworks that I use do give priority to the exact match.

idkCpp commented 6 years ago

While the "freq vs prio" debate is another one to be had (or polled or whatever) I have another idea. How about multiplying a words weight by eg 2 if the edit distance is 0.

I just had the case, that I had the entry "mpv Media Player" which I wanted to find. I don't know how often I already hit it but I guess less than 5 times. So "mp" returns it at top, while "mpv" returns some "maven" folders somewhere on my disk. While edit_distance("mpv" -> "mpv") = 0, edit_distance("mpv" -> "maven") = 3

While writing this I had another idea. How difficult would it be to make the weighing-algo programmable? If you'd open the possibility that users tinker with the algo themselves, probably there is some productive outcome.

vspinu commented 6 years ago

This is actually a very cool idea. What I was proposing is a particular case of the weight == +Inf.

I don't know the details of the current algorithm, but the weight of older hits should decay in time. If the user suddenly started doing Java and working with mvn, then a few recent hits of mvn should outweigh hundreds of 1 year old hits of Media player which are no longer relevant.

How difficult would it be to make the weighing-algo programmable?

It would probably mean moving the algorithm from c++ to python at the least. I would surely love to play with the algo myself. In any case having a few parameters of the algo exposed (multiplicative weight, logarithmic capping, time decay) would go a very long way.

stale[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

idkCpp commented 3 years ago

Is this issue really to be closed? It seems like the discussion has not yet reached a satisfying result.

ManuelSchneid3r commented 2 years ago

@vspinu you are talking about the app extension right? atm a lot of extensions do not provide relevances, thats a problem which I want to adress in 0.18 with a global index which does the scoring implicitly. but it looks like your problem is the MRU score which takes precedence over the uninformed match score. relevance decay is already implemented. see https://github.com/albertlauncher/albert/blob/edd178d2d58002d69c0e5b25b927863251c8e28c/src/app/querymanager.cpp#L236 the weights decrease linear with age. maybe it should rather decrease quadratic.

vspinu commented 2 years ago

@vspinu you are talking about the app extension right?

Not really. My request is very simple - "give priority to exact matches no matter what". If I typed "e" and there is such an application available, put it to the top regardless of what the relevance algorithm is saying.

ManuelSchneid3r commented 2 years ago

You mean e is a perfect match for emacs, and it should be in front even if you ran ewhateverappname often in the past?

Atm recently used information takes precedence over the match score of an item you never ran.

Its basically like: if it matches you get your favorites then sort by count of matched characters divided by the total matchable characters an item has. This seems perfectly natural to me. However I am open to suggestions.

How would you handle this case: you want to run Firefox with a "fi" query. However there is also a file with the name "file". Without MRU info file would always appear in front which is annoying in the end.

Another option would be to mix the score but this is definitvely harder to get it right for everybody

vspinu commented 2 years ago

you want to run Firefox with a "fi" query. However there is also a file with the name "file". Without MRU info file would always appear in front which is annoying in the end.

I am confused. Didn't you say that "fi" would have a score 2/2 and "file" 2/4, thus "fi" has precedence?

Mixing score and MRU stastistic seems natural to me. There could be a configuration option, a weight between 0 and 1. If 0, only MRU is used, if 1 only the score, everything in between is a weighted average of the two. It depends on how MRU is measured if it's a metric between 0 and 1 like the match score is then the average would make sense.

ManuelSchneid3r commented 2 years ago

fi/firefox is 2/7. I agree that mixing is indeed better, but in the end Albert is intended to pimp your personal, recurrent workflows.

The linear weight is as trivial as it can get and I'd prefer to implement "real learning" instead of building around a thought out static scoring. For heavy users the latter would have hardly any advantage.

I thought a lot about getting sophisticated machine learning into it. There should be a real mapping of queries to items (atm only items itself get the score, without taking the query into account). Such that if you used Firefox (Firefox having an additional lodash index string "FF") and ffmpeg equally often in the past, you ran Firefox with fi and ffmpeg with ff, FF should result in ffmpeg instead of Firefox although MRU scores equal and the matchescore is perfect since ff/ff=1.

There is plenty of room for improvements besides the static scoring which is only relevant if you have never used the item.

I dare to say that nobody would use the ruler to find out their personal preferred mixing score since one cannot check the impact in a comprehensible way.

vspinu commented 2 years ago

As a data scientist I very much like the learning approach. No need to get fancy about it IMO. I bet one can do 90% of a full blown ML with a simple linear regression.

The weighting scheme is a one parameter model. Estimate the only coefficient and bam, you have your ML ;) Advantage that people can tinker with the weight themselves and it's crysta clear what it does.

On the other hand I am not sure it's worth the effort. Since I opened an issue I was never botherd by the problem. The MRU does the job really well. It's just that exact match should be a special IMO. At least in all completion system that I used in the past this was the case.

vspinu commented 2 years ago

It just occurred to me that maybe the OP wasn't very clear after all. By "exact match" I mean the full and exact match of the entire program name. In the OP case I really had a program named "e". So the score would be score(e/e) == 1. In your example from above I assumed that there is a program (or a shortcut) fi thus, an exact match, and score == 1 as well. What I claim is that a completion system should pop perfect matches on the top, no matter what the other sorting algorithms do.

ManuelSchneid3r commented 2 years ago

Indeed i got this wrong (although it was pretty clear) but still the slightly changed firefox example would annoy me (How would you handle this case: you want to run Firefox with a "fi" query. However there is also a file with the name "fi". Without MRU info file would always appear in front which is annoying.)

vspinu commented 2 years ago

However there is also a file with the name "fi". Without MRU info file would always appear in front which is annoying

If MRU is used as a disambiguation of the exact match then it's not a problem.

ManuelSchneid3r commented 2 years ago

Simple order and slider like this? https://www.math3d.org/BjaLgHI1p

ManuelSchneid3r commented 2 years ago

MRU and Match scores have to be normalized to [0,1] for each query otherwise the score is biased right? Like https://www.math3d.org/TeKRjPknx

vspinu commented 2 years ago

Yes, normalization is the key, the scaling of the MRU should be 0, never used in past X months, to 1 = most frequently used in past X months.

Are you thinking going into weighting direction? To my mind implementing the "lexicographic" rule would fix the current issued and it's super simple. Just to be clear this is the algo that I meant:

1. Sort as it is currently implemented
2. Bring all the exact matches (score == 1) to the top
3. Disambiguate the matches at 2. with MRU

Of course it all hinges on how happy you are with 1. If you want to improve on 1 then weighiting scheme and auto-learning the parameter (T) might be worth considering.

ManuelSchneid3r commented 2 years ago

Maybe like this $x\cdot T\cdot \left(1-y\right)+y$ ? Matchscore never ignored, additional usage score with user dependent weight. The former approach makes it possible to omit matchscores entirely.

Further there is the problem with the normalization. The usage score atm is somehow meaningless. I mean what is a usage score of 1 at all? Used every day? used weekly? frequency? Also I hate alfred for forgetting that fast. maybe an initial boost of "used at all" makes sense. (which somehow goes in the direction of the currently used usage-before-match-score strategy.

vspinu commented 2 years ago

Matchscore never ignored,

So $y$ is a match score in [0, 1] an $x$ is the usage score. Not bad. Then within the same partial match group candidates are sorted by usage.

I would prefer to have a symmetric form $x(1-w)(1-y)+wy$ where $w\in[0,1]$ with the default of $w=\frac{1}{2}$.

Ideally $y$ should also be in $[0,1]$ for this to truly make sense. Hence the normalization problem that you talk about.

I mean what is a usage score of 1 at all? Used every day? used weekly? frequency?

I don't know how usage decays over time in albert but regardless of the algorithm one can normalize to the maximum usage so far. That is, 1 is scored by the candidate with the maximum usage statistic in the data base. Algorithmicly it amounts to tracking the maximal usage score over time and divide actual usages by this cap during the scoring. Do you see any drawbacks of this approach?

ManuelSchneid3r commented 1 year ago

I would prefer to have a symmetric form $x(1-w)(1-y)+wy$ where $w\in[0,1]$ with the default of $w=\frac{1}{2}$.

https://www.math3d.org/jIRidMWGm For w=0 full match, full usage items have a score of 0;

Do you see any drawbacks of this approach?

No, normalization is needed and makes sense… in theory. But practically users operate on the lower end of match scores. As albert learns my usage, i learn the minimal amount of chars needed to get what I want. E.g. opening Chromium unsing 'c'+enter. I'll have to think about this a bit.

ManuelSchneid3r commented 1 year ago

0.18 makes this user customizable. Still I dont think its perfect. let me know what you think

vspinu commented 1 year ago

Thanks! I will play with it.

ManuelSchneid3r commented 1 year ago

Any thoughs? I personally dont like it really. Permanently random items i dont use are preferred over my used ones.

vspinu commented 1 year ago

I had loads of issues with installation this time. Finally got some time to figure them out. (BTW libqt6sql6-sqlite is not part of docker deps which I usually use to get up to speed).

Trying it now and don't see much of a difference. But I am using only files and apps in my workflow. I set the slider in the middle to get a better feel. So far everything works as expected.

But I see now that this feature is likely to be confusing though. It's hard to reason about it even if one knows the formula, let alone understanding what it does form the short description in the popup. So maybe it's not worth having it after all. Just bring exact matches to the top and leaving everything as it was would certainly solve my OP. This is how my emacs completion works and I am pretty happy with it.

ManuelSchneid3r commented 1 year ago

then a few recent hits of mvn should outweigh hundreds of 1 year old hits of Media player which are no longer relevant.

That's the decay part. Decay: MRU. No decay at all MFU. Their influence decays the "older" they are. Weight is the algo we discussed. You can entirely ignore your past queries, use its information exclusively or anything between.

vspinu commented 1 year ago

I tried it out and the original problem still persists. Exact matches are not reffered.

Plus there is a weird preference for prefix match. I can no longer select google-chrome with chro. It does not even show in the list. Same problem for files, only prefix seems to be matching.

At the same time e would match in the middle of Telegramin the above screenshot. Very confusing,

ManuelSchneid3r commented 1 year ago

Plus there is a weird preference for prefix match.

there have never been substring matches. they make no sense anyway. probably "-" is not in your separators.

At the same time e would match in the middle of Telegramin the above screenshot. Very confusing,

indeed. telegram does not match on my system what happens if you disable all the optons in the app settings?

vspinu commented 1 year ago

there have never been substring matches. they make no sense anyway. probably "-" is not in your separators.

I see, finally I understood the meaning of "separators" :)

I indeed messed the separators accidentally. My current ones are [\s\\\/\-\[\](){}#!?<>"'=+*.:,;_]+. google-chrome appears again.

Would be great to have a "Reset to Default" button to revert all changes to the original.

if you disable all the optons in the app settings?

I disabled everything in the app settings and in the application plugging settings. Same behavior. I think it has to do with the Exec name of the telegram. It starts with env. And I can confirm, env does match Telegram as well.

[Desktop Entry]
X-SnapInstanceName=telegram-desktop
Name=Telegram Desktop
Comment=Official desktop version of Telegram messaging app
Exec=env BAMF_DESKTOP_FILE_HINT=/var/lib/snapd/desktop/applications/telegram-desktop_telegram-desktop.desktop /snap/bin/telegram-desktop -- %u
Icon=/snap/telegram-desktop/4486/meta/gui/icon.png
Terminal=false
StartupWMClass=TelegramDesktop
Type=Application
Categories=Chat;Network;InstantMessaging;Qt;
MimeType=x-scheme-handler/tg;
Keywords=tg;chat;im;messaging;messenger;sms;tdesktop;
Actions=Quit;

[Desktop Action Quit]
Exec=env BAMF_DESKTOP_FILE_HINT=/var/lib/snapd/desktop/applications/telegram-desktop_telegram-desktop.desktop /snap/bin/telegram-desktop -quit
Name=Quit Telegram
Icon=/snap/telegram-desktop/4486/meta/gui/icon.png

One more thing, the settings wheel is not shown in the right corner of the display. It appears only on mouse hovering. See my screenshot in previous post. Is this intentional? It's fine with me as I already know it's there, but new users might not be able to discover it. This is on ubuntu 22.04.

ManuelSchneid3r commented 1 year ago

I think it has to do with the Exec name of the telegram. It starts with env. And I can confirm, env does match Telegram as well.

ty, fix and new option incoming.

Is this intentional

yes. it serves as busy indicator from 0.18 on. well theres still the preferences/settings item and tray icon

ManuelSchneid3r commented 1 year ago

@vspinu ty. generally please open new issues to keep things organized. I could not test this. I'd appreciate if you test is quickly as soon as the new release is out.

vspinu commented 1 year ago

I have tried it and it works as desired. Also if I set "Use "Exec" value for lookup" I no longer see Telegram by invoking env. This is as expected by the virtue of those exclusions that you have added. Thanks!

ManuelSchneid3r commented 1 year ago

Released in 0.19.3

ManuelSchneid3r commented 1 year ago

Better in the recent release? Basically it's a ifttt approach. The sort order is like:

if perfect
  Prefer but still sort by usage
Elif have usage 
  Use usage
Else
  Use marchscore

This way usage and match scores are not interleaved anymore besides the priority for perfect matches. Its hard to make any sense out of mixed scores anyway.

caph1993 commented 5 months ago

Hello, sadly, this functionality is very counterproductive for me. I use albert all day long, and very often I search for the folders "Documents" and "Downloads". But there are numerous folders in my system called "doc", "d", "down" etc., so they get precedence as I type "d...o...c..." even though I never use them. There's also a folder called "Fire", so firefox gets second place. In general, many useless folders with short names like "A", "C", "as", etc. appear very often ranked first.

Is it possible to revert this behavior or implement a different solution? Maybe:

if perfect and recently used: <-- don't put it first if it's not recently used
  Prefer but still sort by usage
Elif have usage 
  Use usage
Else
  Use marchscore

ManuelSchneid3r commented 5 months ago

You can disable the perfect match thing in the settings then it should behave as you expect

caph1993 commented 5 months ago

You can disable the perfect match thing in the settings then it should behave as you expect

Thank you very much. I missed that this setting was already configurable in the query tab.

albertlauncher / albert

Give preference to exact matches #695