Closed nicolasdugue closed 8 months ago
Hi @nicolasdugue! Thank you for your pull request.
First of all, if I am not mistaken, erase_base_features
is not defined and is never used across the model, what is up with that?
Same thing for workers
.
Since this implementation as I understand it is primarily meant for didactical purposes, being it in NetworkX and all, could you kindly extend the comments and explain more extensively things such as:
erase_base_features
and gamma
and what do they do in your model? Are there optimal values? Is there any correlation between gamma and the number of nodes or the density of the graph (especially for gamma)?Another significant bit is testing. Please extend the test suite and test your implementation. Things such as the erase_base_features
error would not have survived testing.
Hi @LucaCappelletti94, Thanks for your comment, I am glad you answered so fast !
I am sorry for the workers and erase_base_features mistakes !
Regarding your remarks/questions :
I was unclear sorry, it is not included in networkx. But the implementation I provided here is based on networkx louvain community detection. In our package, we mostly rely on networkit.
Do you need some more information/tests ?
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 97.44%. Comparing base (
5e1fe9c
) to head (e6064c1
).
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Everything seems okay, I see there are a few typos in the comments. If you have the time to fix them I would appreciate it, otherwise I will do it when I get the time.
Hi @LucaCappelletti94, I tried to fix the typos in the comments, but my english is not great. I did my best but it may not be enough, I am sorry :( Thank you for your help, I am glad that SINr will be included in the package.
Hi, I asked someone to proofread me. Comments should now be okay :)
Thanks, I appreciate it. Later today (or at the latest tomorrow) I will go through the whole thing and I look forward to merge the whole thing.
As fans of the karateclub package, we propose to add the implementation of SINr, a graph embedding approach based on community detection to the package. SINr was first published in : Thibault Prouteau, Victor Connes, Nicolas Dugué, Anthony Perez, Jean-Charles Lamirel, et al.. SINr: Fast Computing of Sparse Interpretable Node Representations is not a Sin!. Advances in Intelligent Data Analysis XIX, 19th International Symposium on Intelligent Data Analysis, IDA 2021. pp.325-337, ⟨10.1007/978-3-030-74251-5_26⟩. ⟨hal-03197434⟩
It allows to produce interpretable node representations and can also compute word embeddings on textual graph data : Béranger, A., Dugué, N., Guillot, S., & Prouteau, T. (2023, November). Filtering communities in word co-occurrence networks to foster the emergence of meaning. In International Conference on Complex Networks and Their Applications (pp. 377-388).
A complete SINr implementation can be found here : https://github.com/SINr-Embeddings/sinr But the code we provide in this pull request is enough to train efficiently node embeddings using networkx and scipy. We would really appreciate our method to be included in the karateclub package that we use a lot !