Closed GoogleCodeExporter closed 9 years ago
I'm not a fan of stopwords generally, actually - I've not generally seen them
improve
search result quality. For the example you mention, the "and" term should have
a
tiny termweight, so shouldn't overwhelm the other terms. Plus, we're using a
default
AND operator at the moment, so both "eggs" and "beans" would be required.
Are there example searches on any of the sample collections we've got which
would be
improved by stopwords?
For reference, though, Snowball provides lists of common stopwords for various
languages, so we could automatically suggest stopwords if we want to.
Alternatively,
we could generate a list of potential stopwords for a given database by looking
at
the common words - a human could then mark the ones which "don't really mean
anything".
Original comment by boulton.rj@gmail.com
on 30 Oct 2007 at 3:13
Original comment by charliej...@gmail.com
on 1 Nov 2007 at 4:42
We all hate stopwords.
Original comment by boulton.rj@gmail.com
on 1 Nov 2007 at 4:42
Original issue reported on code.google.com by
charliej...@gmail.com
on 30 Oct 2007 at 2:47