Also includes changing mca_threshold to ca_threshold in various places
Add text preprocessing for structured data
Applies NLP techniques to specified columns within structured data and creates a scalar value for the sum of the tf-idf vector
This pull request closes #93
- What I did
Used NLTK library for tokenize, filter stop words, and convert each word to it's respective lemma
Used autocorrect's speller to fix mispellings
Used sklearn's TF-IDF vectorizer to give weight to the rarer words
- How I did it
I coded
- How to verify it
Include text based columns that you do not want one hot encoded in the text param for the different models
[ ] I updated the docs.
This pull request adds a new feature to Libra. @Palashio, could you please take a look at it?
Also includes changing mca_threshold to ca_threshold in various places
Add text preprocessing for structured data Applies NLP techniques to specified columns within structured data and creates a scalar value for the sum of the tf-idf vector
This pull request closes #93
- What I did Used NLTK library for tokenize, filter stop words, and convert each word to it's respective lemma Used autocorrect's speller to fix mispellings Used sklearn's TF-IDF vectorizer to give weight to the rarer words - How I did it I coded - How to verify it
Include text based columns that you do not want one hot encoded in the text param for the different models
This pull request adds a new feature to Libra. @Palashio, could you please take a look at it?