MichaelAquilina / Reddit-Recommender-Bot

Indentifying Interesting Documents for Reddit using Recommender Techniques
7 stars 0 forks source link

Consider more extensive stopword list #72

Closed MichaelAquilina closed 10 years ago

MichaelAquilina commented 10 years ago

The stopword list used in nltk is rather short and does not filter out words like "also" which can increase the size of the index and does not really contribute any information. See 'stopwords.txt' as an example of a stopwords list to make use of.