flaxsearch / flaxcode

Automatically exported from code.google.com/p/flaxcode
4 stars 1 forks source link

Should stopwords be automatically provided for new collections? #66

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
If a collection is in English or language not specified, should we provide
a list of stopwords initially? It's unlikely people will take the time to
add them otherwise and it does seriously affect search quality (i.e. search
for 'eggs and beans' gives lots of results containing 'and'.). We could
fill the box when the collection is first created and then empty it if the
language is changed.

Original issue reported on code.google.com by charliej...@gmail.com on 30 Oct 2007 at 2:47

GoogleCodeExporter commented 9 years ago
I'm not a fan of stopwords generally, actually - I've not generally seen them 
improve
search result quality.  For the example you mention, the "and" term should have 
a
tiny termweight, so shouldn't overwhelm the other terms.  Plus, we're using a 
default
AND operator at the moment, so both "eggs" and "beans" would be required.

Are there example searches on any of the sample collections we've got which 
would be
improved by stopwords?

For reference, though, Snowball provides lists of common stopwords for various
languages, so we could automatically suggest stopwords if we want to.  
Alternatively,
we could generate a list of potential stopwords for a given database by looking 
at
the common words - a human could then mark the ones which "don't really mean 
anything".

Original comment by boulton.rj@gmail.com on 30 Oct 2007 at 3:13

GoogleCodeExporter commented 9 years ago

Original comment by charliej...@gmail.com on 1 Nov 2007 at 4:42

GoogleCodeExporter commented 9 years ago
We all hate stopwords.

Original comment by boulton.rj@gmail.com on 1 Nov 2007 at 4:42