MichaelAquilina / Reddit-Recommender-Bot

Indentifying Interesting Documents for Reddit using Recommender Techniques
7 stars 0 forks source link

Reduce number of terms by filtering tfidf values of query vector #78

Closed MichaelAquilina closed 10 years ago

MichaelAquilina commented 10 years ago

Query vectors will become substantially large at later stages as they will be entire documents. In order to improve performance it might be a good idea to filter less useful terms which have a lower tfidf score in the query. This can be done by removing all those tfidf value terms which are lower than the 90th percentile.