benedekrozemberczki / karateclub

Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs (CIKM 2020)
https://karateclub.readthedocs.io
GNU General Public License v3.0
2.17k stars 247 forks source link

SINr implementation to compute node embeddings from communities #156

Closed nicolasdugue closed 8 months ago

nicolasdugue commented 8 months ago

As fans of the karateclub package, we propose to add the implementation of SINr, a graph embedding approach based on community detection to the package. SINr was first published in : Thibault Prouteau, Victor Connes, Nicolas Dugué, Anthony Perez, Jean-Charles Lamirel, et al.. SINr: Fast Computing of Sparse Interpretable Node Representations is not a Sin!. Advances in Intelligent Data Analysis XIX, 19th International Symposium on Intelligent Data Analysis, IDA 2021. pp.325-337, ⟨10.1007/978-3-030-74251-5_26⟩. ⟨hal-03197434⟩

It allows to produce interpretable node representations and can also compute word embeddings on textual graph data : Béranger, A., Dugué, N., Guillot, S., & Prouteau, T. (2023, November). Filtering communities in word co-occurrence networks to foster the emergence of meaning. In International Conference on Complex Networks and Their Applications (pp. 377-388).

A complete SINr implementation can be found here : https://github.com/SINr-Embeddings/sinr But the code we provide in this pull request is enough to train efficiently node embeddings using networkx and scipy. We would really appreciate our method to be included in the karateclub package that we use a lot !

LucaCappelletti94 commented 8 months ago

Hi @nicolasdugue! Thank you for your pull request.

First of all, if I am not mistaken, erase_base_features is not defined and is never used across the model, what is up with that? Same thing for workers.

Since this implementation as I understand it is primarily meant for didactical purposes, being it in NetworkX and all, could you kindly extend the comments and explain more extensively things such as:

Another significant bit is testing. Please extend the test suite and test your implementation. Things such as the erase_base_features error would not have survived testing.

nicolasdugue commented 8 months ago

Hi @LucaCappelletti94, Thanks for your comment, I am glad you answered so fast !

I am sorry for the workers and erase_base_features mistakes !

Regarding your remarks/questions :

I was unclear sorry, it is not included in networkx. But the implementation I provided here is based on networkx louvain community detection. In our package, we mostly rely on networkit.

Do you need some more information/tests ?

codecov-commenter commented 8 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 97.44%. Comparing base (5e1fe9c) to head (e6064c1).

:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #156 +/- ## ========================================== + Coverage 97.41% 97.44% +0.03% ========================================== Files 63 64 +1 Lines 2712 2744 +32 ========================================== + Hits 2642 2674 +32 Misses 70 70 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

LucaCappelletti94 commented 8 months ago

Everything seems okay, I see there are a few typos in the comments. If you have the time to fix them I would appreciate it, otherwise I will do it when I get the time.

nicolasdugue commented 8 months ago

Hi @LucaCappelletti94, I tried to fix the typos in the comments, but my english is not great. I did my best but it may not be enough, I am sorry :( Thank you for your help, I am glad that SINr will be included in the package.

nicolasdugue commented 8 months ago

Hi, I asked someone to proofread me. Comments should now be okay :)

LucaCappelletti94 commented 8 months ago

Thanks, I appreciate it. Later today (or at the latest tomorrow) I will go through the whole thing and I look forward to merge the whole thing.