carpentries-incubator / twitter-with-twarc

Introduction to Harvesting Twitter Data with Twarc
https://carpentries-incubator.github.io/twitter-with-twarc/
Other
4 stars 4 forks source link

ep 5 stopwords #38

Open jonjab opened 2 years ago

jonjab commented 2 years ago

I think we download the stopword list, but then never use it.

something like this needs to happen: filtered_words = [word for word in word_list if word not in stopwords.words('english')]

It's already in the notebook:

library_str_stopped = [word for word in library_string.split() if word.lower() not in sw_nltk] library_words_stopped = " ".join(library_str_stopped)

we should actually do sentiment analysis in ep 8 with a stopped and an unstopped file to compare.

"Pay attention that a word like "not" is also considered a stopword in nltk." --that could mess up your sentiment analysis.