TutteInstitute / vectorizers

Vectorizers for a range of different data types
BSD 3-Clause "New" or "Revised" License
93 stars 23 forks source link

Refactored WassersteinVectorizer #126

Closed jc-healy closed 11 months ago

jc-healy commented 1 year ago

I folded SinkhornVectorizer and ApproximateWassersteinVectorizer together into the WassersteinVectorizer class as method options (LOT_exact, LOT_sinkhorn, HeuristicLinearAlgebra).

Additionally, I added an input_mode parameter into the constructor to allow checking of valid parameters on construction (spmatrix, lil, generator). This acts as a contract regarding what kind of data will be passed to fit() and transform() in the future.

I'm submitting an early PR to make use of our code coverage features. This currently includes my refactoring of the basic unit tests which we previously tested against. I'll spend a few days adding stronger unit tests and ensuring all the code paths are covered.

lmcinnes commented 1 year ago

This looks like it came together surprisingly quickly. I looks forward to any extra tests.

codecov-commenter commented 1 year ago

Codecov Report

Merging #126 (75fdf35) into master (8260c0f) will decrease coverage by 5.06%. Report is 2 commits behind head on master. The diff coverage is 54.21%.

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

@@            Coverage Diff             @@
##           master     #126      +/-   ##
==========================================
- Coverage   91.25%   86.20%   -5.06%     
==========================================
  Files          34       34              
  Lines        4869     5174     +305     
==========================================
+ Hits         4443     4460      +17     
- Misses        426      714     +288     
Files Coverage Δ
vectorizers/edge_list_vectorizer.py 80.00% <100.00%> (+2.72%) :arrow_up:
vectorizers/tests/test_edge_list_vectorizer.py 100.00% <100.00%> (ø)
vectorizers/transformers/sliding_windows.py 93.00% <100.00%> (ø)
vectorizers/base_cooccurrence_vectorizer.py 88.19% <0.00%> (ø)
vectorizers/tests/test_common.py 99.49% <97.70%> (-0.51%) :arrow_down:
vectorizers/linear_optimal_transport.py 66.06% <40.36%> (-26.95%) :arrow_down:

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

lmcinnes commented 11 months ago

Is this ready to be merged?

jc-healy commented 11 months ago

Yep, Benoit gave me a hand hammering some deprecation bugs that had crept in while the PR was waiting. I merged his changes into mine and the unit tests now all pass. Merged