alteryx / nlp_primitives

Natural Language Processing primitives for Featuretools
https://blog.featurelabs.com/natural-language-processing-featuretools/
BSD 3-Clause "New" or "Revised" License
37 stars 11 forks source link

`clean_tokens` performs case sensitive comparison to `nltk.corpus.stopwords` #210

Closed sbadithe closed 2 years ago

sbadithe commented 2 years ago

For example, "yourself" is a stopword but "Yourself" is not being matched as a stopword. I think this is a little strange, because our StopwordCount primitive is case insensitive. One possible behavior we can investigate implementing is to call .lower() on the word before checking whether it is a stopword.