Closed MartinKolarik closed 11 months ago
It seems like in your index the results are usually better, but it's indeed hard to test. We probably should list some of the popular queries that aren't verbatim a popular package and test those?
Note that even after this change, proximity might have more importance than it deserves.
Consider a query "themes bootstrap" for which probably the most relevant result would be bootswatch
but in its description, it has "Bootswatch is a collection of themes for Bootstrap." so because of the extra "for" it has proximity 2 and gets pushed to the 4th place, below fairly unpopular packages.
Even worse, if you happen to search for "bootstrap themes", then the computed proximity is 3 (reverse order) and bootswatch
is on position 22, even after some deprecated packages.
Changing minProximity
might make sense here - the proximity for sure has some value, but not that big since we have a lot of custom ranking attributes as well. I'd say 3 is an absolute minimum for simple cases like this, but it might very well be even 5 ("can be together anywhere in a short sentence").
After testing this a bit more on random multi-word package names, I don't see any obvious downside here, and it makes a big difference when you query multi-word package names with non-exact names, e.g.:
Considering the settings can be changed in a matter of seconds as needed, I'd say let's change this and readjust later if we find any issues.
Let's do it!
I just came across a very unintuitive behavior searching for the package
bootstrap-vue
. I didn't remember the name exactly and instead searched forvue bootstrap
: https://www.jsdelivr.com/?query=vue%20bootstrapThe query didn't match the package name because of the order of words, but still, the package has "vue" and "bootstrap" as its keywords as well and is very popular, so I'd expect it to be the top result. Unfortunately, it seems the engine treats the array of keywords similarly to text, and so the order and proximity of them still play a role. Instead of the correct result, I got pages and pages of garbage, which happened to have the keywords in the "right" order.
Looking at the config options, I see this could be fixed by swapping the priority of "attribute" and "proximity" in ranking, and at first sight, this makes sense to me. Our searchable attributes are:
Package names are very short, and hitting more of the correct words should be more important than hitting them in the right order. For keywords, this is absolutely the case. For description, I'm not sure which one makes more sense, but even if it was proximity, I think good matching on names and keywords is more important.
We should definitely test this more before making any changes but I'm putting it as an idea here. @pixelastic @Haroenv what do you think? Are there any cases you can think of this would make worse? I already made this change on index
npm-search-dev-martin
for testing.