Hoosier-Clusters / clusim

An extended package for clustering similarity
MIT License
63 stars 15 forks source link

implement faster PPR function #36

Closed jisungyoon closed 4 years ago

jisungyoon commented 4 years ago

Hi, I am Jisung Yoon I like Clusim method a lot, and I am trying to Clusim as the overlapping community detection's performance measure.

Usually, in an overlapping community detection task, I deal with the very huge network with labels and found that the current version of clusim is extremely slow. While I investigate about Clusim code, I found that igraph's implementation of PPR is a bottle-neck of the current implementation.

So, I implement the faster version of PPR with the power-iteration method and sparse matrix and It is almost 20~30 faster than the previous code.

I implement ppr_implementation optional flag on clusim. If you set ppr_implementation as 'power_iteraion", then it uses my implementation of PPR. Otherwise, it uses the previous code.

Here is example on my network.

Screen Shot 2020-04-15 at 9 19 15 PM

Please take a look, let me know what you think:) @ajgates42 @yy

Thanks very much!

yy commented 4 years ago

Thanks! I think it is a really nice improvement. Maybe adding a test to check the consistency will be great? What do you think @ajgates42

ajgates42 commented 4 years ago

Thanks @jisungyoon! Overall this looks great. But I agree with YY, can you add a test or small example to demonstrate both prpack and power_iteration are returning approximately the same answer?

jisungyoon commented 4 years ago

Thanks @jisungyoon! Overall this looks great. But I agree with YY, can you add a test or small example to demonstrate both prpack and power_iteration are returning approximately the same answer?

OK, I will! Thanks!

jisungyoon commented 4 years ago

I just add the test code for the new implementation! @yy @ajgates42

ajgates42 commented 4 years ago

Thanks @jisungyoon, this looks good. I appreciate your improvements to the package!