alteryx / nlp_primitives

Natural Language Processing primitives for Featuretools
https://blog.featurelabs.com/natural-language-processing-featuretools/
BSD 3-Clause "New" or "Revised" License
37 stars 11 forks source link

Add `NumberOfUniqueWords` primitive #187

Closed sbadithe closed 2 years ago

sbadithe commented 2 years ago
codecov[bot] commented 2 years ago

Codecov Report

Merging #187 (49d16fa) into main (7d720ff) will increase coverage by 0.05%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##             main     #187      +/-   ##
==========================================
+ Coverage   99.16%   99.21%   +0.05%     
==========================================
  Files          47       49       +2     
  Lines        1199     1279      +80     
==========================================
+ Hits         1189     1269      +80     
  Misses         10       10              
Impacted Files Coverage Δ
nlp_primitives/__init__.py 100.00% <100.00%> (ø)
nlp_primitives/number_of_unique_words.py 100.00% <100.00%> (ø)
...lp_primitives/tests/test_number_of_unique_words.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

sbadithe commented 2 years ago

Right now this doesn't handle contractions:

can't is tokenized by nltk into can and t. Trying regex

EDIT: I think regex solves the issue