code-kern-ai / bricks

Open-source natural language enrichments at your fingertips.
Apache License 2.0
451 stars 23 forks source link

[MODULE] - Stop word remover #281

Open LeonardPuettmannKern opened 1 year ago

LeonardPuettmannKern commented 1 year ago

Please describe the module you would like to add to bricks A brick module that removes stopwords from a text.

Do you already have an implementation?

from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

example_sent = """This is a sample sentence,
                  showing off the stop words filtration."""

stop_words = set(stopwords.words('english'))

word_tokens = word_tokenize(example_sent)
# converts the words in word_tokens to lower case and then checks whether 
#they are present in stop_words or not
filtered_sentence = [w for w in word_tokens if not w.lower() in stop_words]
#with no lower case conversion
filtered_sentence = []

for w in word_tokens:
    if w not in stop_words:
        filtered_sentence.append(w)

Additional context Uses NLTK, would be a generator module.

HongweiRuan commented 3 days ago

Hi, this is my first time contributing, I would like to take on this issue. Could you assign it to me?