CWTSLeiden / networkanalysis

Java package that provides data structures and algorithms for network analysis.
MIT License
145 stars 33 forks source link

Is the code from LeidenParallel branch sequential? #26

Closed wolfram77 closed 11 months ago

wolfram77 commented 11 months ago

Hello, I am trying to test the parallel version of Leiden algorithm. I have checked out the LeidenParallel branch, compiled it by running release.sh, and testing it on web-Stanford graph from the SuiteSparse matrix collection (after converting it to a TSV file). I get the following output.

Reading edge list from '/home/subhajit.sahu/Data/web-Stanford.mtx.tsv'.
Reading edge list took 1s.
Network consists of 281904 nodes and 2312497 edges.
Using 0 worker threads. 0 workers means original sequential algorithm.
Using singleton initial clustering.
Running Leiden algorithm.
...

It says 0 worker threads. Is this just an old output message?

Note that this is not a pull request.

vtraag commented 11 months ago

First, it is a bit confusing to open a Pull Request, while you actually want to open an issue. Instead of opening the Pull Request, you could just point to the branch https://github.com/Geertex/networkanalysis/tree/LeidenParallel in an issue.

Secondly, this is not a branch from this networkanalysis package, but from a fork by @Geertex. He researched a parellel implementation of the Leiden algorithm for his master thesis, which is available from https://theses.liacs.nl/1693. The code that was used for the thesis is available from https://github.com/Geertex/networkanalysis/tree/LeidenParallelV1.0

I believe you can specify the number of workers using --parallel-workers. Hopefully that provides enough to go on!

I'll close this Pull Request for now. If you have other questions, specifically about the parallel implementation, perhaps it's better to reach out to @Geertex. For other questions, feel free to open an issue.

wolfram77 commented 11 months ago

Thank you, the --parallel-workers options works. I did come across Geerten's implementation after going through his master's thesis.

Geertex commented 11 months ago

Hi Subhajit and Vincent,

The implementation in my forked branch is partially parallel, only the part that does the fast local moving algorithm. Depending on the input data this part may or may not be the most compute intensive part. Be aware that the implementation is totally non-blocking and thus the result might be different from a sequential implementation. This is due to different threads reading and writing to the same array concurrently.

Goodluck and kind regards, Geerten

Op do 14 dec 2023 om 14:16 schreef Subhajit Sahu @.***>:

Thank you, the --parallel-workers options works. I did come across Geerten's implementation after going through his master's thesis.

— Reply to this email directly, view it on GitHub https://github.com/CWTSLeiden/networkanalysis/pull/26#issuecomment-1855834110, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTYFQP4JU7UYCDVI3HFLBTYJL32JAVCNFSM6AAAAABAUJED32VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNJVHAZTIMJRGA . You are receiving this because you were mentioned.Message ID: @.***>

wolfram77 commented 11 months ago

Geerten, with the Louvain algorithm, on the graphs I tried, I observe that after the local-moving phase has been optimized, the aggregation phase becomes the bottleneck. To optimize it, I generate the aggregated/super-vertex graph as a CSR (with gaps) in parallel. I am expecting this might also be the case for Leiden algorithm. Thanks for the details.