danielobrien3 / HBC-Text-Similarity-Analysis

Multi-tiered Text Similarity Analysis Project for Honors by Contract Project
1 stars 0 forks source link

Filtered corpus not used for analysis #1

Open danielobrien3 opened 4 years ago

danielobrien3 commented 4 years ago

Stop words are successfully determined by how frequently they show up in the corpus and filtered out. However, the newly filtered articles are not used for the analysis.

TODO: Use the filtered articles.

danielobrien3 commented 4 years ago

Would be best if running the filter_stopwords method took all of the stop words out of the article's text directly instead of creating a new string object. This way no other code needs to change. Simple.