TutteInstitute / vectorizers

Vectorizers for a range of different data types
BSD 3-Clause "New" or "Revised" License
97 stars 23 forks source link

Add dask for simple parallel computation in the word vectorizer #71

Closed lmcinnes closed 3 years ago

lmcinnes commented 3 years ago

Add dask to requirements and test infrastructure; then use dask for simple parallelism in constructing the cooccurrence matrix. This is not a full distributed version of things, but it is a start, and speeds things up with reasonable parallel computation.

codecov-commenter commented 3 years ago

Codecov Report

Merging #71 (4689549) into master (d4810ee) will decrease coverage by 0.04%. The diff coverage is 87.50%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #71      +/-   ##
==========================================
- Coverage   89.90%   89.86%   -0.05%     
==========================================
  Files          19       19              
  Lines        3498     3513      +15     
  Branches      658      661       +3     
==========================================
+ Hits         3145     3157      +12     
- Misses        298      301       +3     
  Partials       55       55              
Impacted Files Coverage Δ
vectorizers/token_cooccurrence_vectorizer.py 88.70% <85.18%> (-0.30%) :arrow_down:
vectorizers/linear_optimal_transport.py 90.68% <100.00%> (+0.03%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d4810ee...4689549. Read the comment docs.