This is a library plus a test harness for collecting algorithms that use the GraphBLAS. For test coverage reports, see https://graphblas.org/LAGraph/ . Documentation: https://lagraph.readthedocs.org
Other
229
stars
61
forks
source link
Improvements for Community Detection using Label Propagation #192
I modified the existing algorithm LAGraph_cdlp to take an LAGraph_Graph as an input parameter. It now also unconditionally sanitizes the input matrix (based on the experience with LCC that this doesn't have a negative impact on performance).
The existing version also had the same oddity as LCC, in that sanitize was set to true when the number of self edges was > 0, but the algorithm actually didn't remove them in the sanitize step. In this PR, the self edges are also not removed in the algorithm, but actually before it is called, in the test and demo programs. Based on my understanding of the algorithm, it should be ok not to remove the self edges: If a node is connected to itself, its label just also contributes to the label counts during the algorithm. That should be fine AFAICT.
I made some additional improvements to the original algorithm. (Using pack for creating the initial L, and using GxB_Vector_diag for extracting the final result.)
The existing test case for jagmesh7.mtx was apparently wrong: The hard-coded labels were based on the self edges not being removed. When you remove the self edges, the actual result differs from the expected result.
I also contributed a new algorithm called LAGraph_cdlp_nosort. It avoids the sorting step, and instead counts label occurrences per row in a simple property map. The original goal was to use this algorithm for checking the results against a ground truth, but this new version actually seems significantly faster than the previous one, at least for the input sets I tried them with.
test_cdlp.c now compares LAGraph_cdlp and LAGrap_cdlp_nosort to each other.
cdlp_demo.c runs LAGraph_cdlp_nosort once, compares it to LAGraph_cdlp, and then measures LAGraph_cdlp three times to create an average runtime.
cdlp_nosort_demo.c does the inverse: it runs LAGraph_cdlp once, compares it to LAGraph_cdlp_nosort, and then measure LAgraph_cdlp_nosort three times.
I made a couple of improvements for CDLP:
I modified the existing algorithm LAGraph_cdlp to take an LAGraph_Graph as an input parameter. It now also unconditionally sanitizes the input matrix (based on the experience with LCC that this doesn't have a negative impact on performance).
The existing version also had the same oddity as LCC, in that sanitize was set to true when the number of self edges was > 0, but the algorithm actually didn't remove them in the sanitize step. In this PR, the self edges are also not removed in the algorithm, but actually before it is called, in the test and demo programs. Based on my understanding of the algorithm, it should be ok not to remove the self edges: If a node is connected to itself, its label just also contributes to the label counts during the algorithm. That should be fine AFAICT.
I made some additional improvements to the original algorithm. (Using pack for creating the initial L, and using GxB_Vector_diag for extracting the final result.)
The existing test case for jagmesh7.mtx was apparently wrong: The hard-coded labels were based on the self edges not being removed. When you remove the self edges, the actual result differs from the expected result.
I also contributed a new algorithm called LAGraph_cdlp_nosort. It avoids the sorting step, and instead counts label occurrences per row in a simple property map. The original goal was to use this algorithm for checking the results against a ground truth, but this new version actually seems significantly faster than the previous one, at least for the input sets I tried them with.
test_cdlp.c now compares LAGraph_cdlp and LAGrap_cdlp_nosort to each other.
cdlp_demo.c runs LAGraph_cdlp_nosort once, compares it to LAGraph_cdlp, and then measures LAGraph_cdlp three times to create an average runtime.
cdlp_nosort_demo.c does the inverse: it runs LAGraph_cdlp once, compares it to LAGraph_cdlp_nosort, and then measure LAgraph_cdlp_nosort three times.