TutteInstitute / vectorizers

Vectorizers for a range of different data types
BSD 3-Clause "New" or "Revised" License
93 stars 23 forks source link

BayesEM and Dynamic COO memory #59

Closed cjweir closed 3 years ago

cjweir commented 3 years ago

I added the Bayesian EM (it doesn't produce good results for words but it might be useful in other settings so it's there). I also changed the coo construction to build bigger arrays if space is running low. It works very well, causes very little hit to runtime but saves on memory and doesn't force examples to crash when you set too low of an array size but more memory is available.

codecov-commenter commented 3 years ago

Codecov Report

Merging #59 (75bd319) into master (9b2a8e7) will decrease coverage by 0.89%. The diff coverage is 32.97%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #59      +/-   ##
==========================================
- Coverage   66.21%   65.31%   -0.90%     
==========================================
  Files          19       19              
  Lines        2824     2895      +71     
==========================================
+ Hits         1870     1891      +21     
- Misses        954     1004      +50     
Impacted Files Coverage Δ
vectorizers/coo_utils.py 7.77% <8.57%> (+0.29%) :arrow_up:
vectorizers/utils.py 49.67% <31.25%> (-4.80%) :arrow_down:
vectorizers/token_cooccurrence_vectorizer.py 63.93% <43.75%> (+0.56%) :arrow_up:
vectorizers/tests/test_common.py 99.78% <100.00%> (+0.22%) :arrow_up:
vectorizers/preprocessing.py 89.11% <0.00%> (-1.37%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 9b2a8e7...75bd319. Read the comment docs.