Open tfmorris opened 8 years ago
@tfmorris, hands down the best issue so far ;-)
This unfortunate page seems to be the only one currently in the index containing all 3 words: https://explain.commonsearch.org/?g=en&q=new+engla+volleyball
when you type that exact same query in google, it looks for "england" instead. Not sure what we should do here! Do we want to have an "exact" mode like them? I'm never fond of adding parameters to the search.
As you can see in the explain output for your second query, both have all the words: https://explain.commonsearch.org/?g=en&q=new+england+volleyball
So currently for this case: (2 words in url + 2 words in title + 1 in body > 1 word in url, 3 words in title). Should it be the other way?
The best way of fixing this would be to recognize "New England" as an entity, but that's not on the short-term roadmap.
Hmm do we want a safe-search option?
I'm not sure that you necessarily need entity recognition to be able to handle New England as a phrase. I suspect that it could be done with n-gram frequency or something else "dumber" than full entity recognition (search isn't my area of expertise). Often "new" is a relatively insignificant adjective, but in this context it has important significance.
I'm not sure words in the URL should count much at all. If you look at the Google search results: https://www.google.com/search?q=new+england+volleyball there are a whole bunch of top hits that use abbreviations like NERVA, NECVL, etc.
And yes, Safe Search would definitely need to be part of a production service. I'd leave it turned off, but it should probably default to being on. More important to fix relevancy though.
p.s. In addition to the other English volleyball pages near the top of the list, further down there is:
YMCA www.newdelhiymca.in
While fixing bi-gram identification of "New Delhi" would probably solve this, I'd also argue that if it's not part of a phrase "new" should be a stop word that's either not considered or given very, very low weight.
URL of the results:
https://uidemo.commonsearch.org/?g=en&q=new+engla+volleyball
Describe the issue precisely:
I was editing an existing search query and as I was typing, I saw this variant flash the result "FREE MILF MOM" on livesexbook.com as I was going by. Not sure how that relates to even the mangled query.
Also, the search https://uidemo.commonsearch.org/?g=en&q=new+england+volleyball returns:
which seems like a backward ordering to me since #2 has an all the words, but #1 doesn't.