alteryx / nlp_primitives

Natural Language Processing primitives for Featuretools
https://blog.featurelabs.com/natural-language-processing-featuretools/
BSD 3-Clause "New" or "Revised" License
37 stars 11 forks source link

Allow users to specify which language to use when removing stop words in the LSA primitive #190

Open thehomebrewnerd opened 2 years ago

thehomebrewnerd commented 2 years ago

The LSA primitive applies a cleaning step that removes stop words. Currently this is hard-coded to remove English stop words:

swords = set(nltk.corpus.stopwords.words("english"))

The primitive should be updated to allow users to specify other languages that are supported by nltk so the primitive functions properly on natural language columns that are not in English.