Fit transform - Githubissues

benedekrozemberczki / karateclub

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)

https://karateclub.readthedocs.io

GNU General Public License v3.0

2.14k stars 244 forks source link

Fit transform #145

Open tomlincr opened 1 year ago

tomlincr commented 1 year ago

Added .fit_transform method to all node embedding algorithms, primarily motivated by desire to use karateclub algorithms in a scikit-learn pipeline.

Adds:

y=None argument, for scikit-learn compatibility
Passthrough if y is not None to allow passing e.g. node attributes through for a downstream task in the pipeline

Tests:

Method tested for each algorithm
Generally testing that output matches that of .get_embedding()
Unless stochastic method, when testing that shapes match

tomlincr commented 1 year ago

Apologies, long day and thought I'd opened this PR on my fork to test coverage, CI etc.

tomlincr commented 1 year ago

Interesting, all passes locally.
Seems to be some variation in the embeddings generated by multiple fits when run by actions.
Will test shape matches instead for these offenders

codecov-commenter commented 1 year ago

:warning: Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 97.54%. Comparing base (d750b33) to head (716a796). Report is 31 commits behind head on master.

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## master #145 +/- ## ========================================== + Coverage 97.41% 97.54% +0.12% ========================================== Files 63 63 Lines 2707 2849 +142 ========================================== + Hits 2637 2779 +142 Misses 70 70 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

LucaCappelletti94 commented 7 months ago

I have tried to run the test suite of this pull request, but it is currently failing at the HOPE model test. I see that you are comparing the two embeddings - maybe there are numerical instabilities that lead to different results over different runs? I am not familiar with the internals of numpy & scipy that much.