TutteInstitute / vectorizers

Vectorizers for a range of different data types
BSD 3-Clause "New" or "Revised" License
93 stars 23 forks source link

Added max_unique_tokens #89

Closed cjweir closed 2 years ago

cjweir commented 2 years ago

Tidied up a few things and added a max_unique_tokens parameter.

codecov-commenter commented 2 years ago

Codecov Report

Merging #89 (f7960c3) into master (2f44a18) will decrease coverage by 0.35%. The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #89      +/-   ##
==========================================
- Coverage   90.86%   90.50%   -0.36%     
==========================================
  Files          28       28              
  Lines        4499     4487      -12     
==========================================
- Hits         4088     4061      -27     
- Misses        411      426      +15     
Impacted Files Coverage Δ
vectorizers/ngram_vectorizer.py 86.55% <100.00%> (+0.11%) :arrow_up:
vectorizers/preprocessing.py 89.54% <100.00%> (+0.42%) :arrow_up:
vectorizers/skip_gram_vectorizer.py 91.07% <100.00%> (+0.08%) :arrow_up:
vectorizers/tests/test_common.py 99.84% <100.00%> (-0.01%) :arrow_down:
vectorizers/token_cooccurrence_vectorizer.py 90.55% <100.00%> (-0.15%) :arrow_down:
vectorizers/utils.py 65.98% <0.00%> (-7.62%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 2f44a18...f7960c3. Read the comment docs.