Please describe the module you would like to add to bricks
A brick module that removes stopwords from a text.
Do you already have an implementation?
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
example_sent = """This is a sample sentence,
showing off the stop words filtration."""
stop_words = set(stopwords.words('english'))
word_tokens = word_tokenize(example_sent)
# converts the words in word_tokens to lower case and then checks whether
#they are present in stop_words or not
filtered_sentence = [w for w in word_tokens if not w.lower() in stop_words]
#with no lower case conversion
filtered_sentence = []
for w in word_tokens:
if w not in stop_words:
filtered_sentence.append(w)
Additional context
Uses NLTK, would be a generator module.
Please describe the module you would like to add to bricks A brick module that removes stopwords from a text.
Do you already have an implementation?
Additional context Uses NLTK, would be a generator module.