GraphBLAS / LAGraph

This is a library plus a test harness for collecting algorithms that use the GraphBLAS. For test coverage reports, see https://graphblas.org/LAGraph/ . Documentation: https://lagraph.readthedocs.org
Other
229 stars 61 forks source link

Improvements for local clustering coefficients #185

Closed pcostanza closed 1 year ago

pcostanza commented 1 year ago

I have made some improvements to the algorithm for computing local clustering coefficients in LAGraph. Specifically:

In some simple tests, I don’t see a performance improvement, nor a degradation. But hopefully this helps us to perform further experiments with this algorithm.

Pascal

szarnyasg commented 1 year ago

Hi Pascal, thanks for opening this PR! One thing that pops out is that your code is that it is mixing bool/int/fp types. I initially made the same change. Then I went through the LCC codebase with Tim Davis in a meeting, where he suggested to use FP64 values everywhere in this algorithm – otherwise, we may get generic SuiteSparse:GraphBLAS operations which use lots of typecasting. His recommendation was to set burble mode on (GxB_set (GxB_BURBLE, true) ;), read the output and ensure that there are no generic operations.

pcostanza commented 1 year ago

Hi Gabor. Thanks a lot for the feedback. I changed the code to use FP64 values everywhere, as you suggested, and indeed, this leads to be significant performance improvement. Thanks a lot. (It's added to the PR.)

There are still generic calls because I'm testing against SuiteSparse:GraphBLAS 7. LAGraph seems to have issues with SuiteSparse:GraphBLAS 8 due to the removal of GrB_SelectOp. Is somebody already adapting this?

DrTimothyAldenDavis commented 1 year ago

The dev branch has a fix for LAGraph that removes the GxB_SelectOp and uses a GrB_IndexUnaryOp instead.

pcostanza commented 1 year ago

I have tried the algorithm in different configurations. I noticed the following variations in the different versions of the algorithm that I have seen:

I have run the benchmark for the road and the web data sets from the GAP benchmark suite, with all combinations (2^5), three times each. I don’t see significant variations in performance between the different combinations.

The strongest impact seems to be the sanitize variation: Unconditionally performing the transformation into binary form generally seems to yield better performance than trying to avoid it. (I don’t know if the performance win comes from this particular step, or if the impact is from later steps in the algorithm.) Since this also produces a version of the algorithm that is correct in all cases, it seems strongly advisable to use that one.

The other variations don’t seem to matter much. For the bin method, using the apply rather than the assign operation seems to be slightly better. For the remaining variations (add, reduce, mul), the simpler forms have a very slight tendency to perform better, but this might be because of other random effects. Nevertheless, it might then be better to prefer the simpler code (avoiding the creation of the true scalar; avoiding the zeros vector; and avoiding the distinction between different semirings for the matrix-matrix multiplication).

I have modified my pull request accordingly.

DrTimothyAldenDavis commented 1 year ago

For the # of entries per row of a matrix, see LAGraph_Cached_OutDegree. That's the fastest method. It uses matrix-vector-multiply with a plus_one semiring and a dense vector. The time taken by SSGrB is O (# of nonempty rows of A) because this case triggers a special case. A dense vector x means that for v=A*x, the value of v(i) is just the # of entries in A(i,:). But I know that already so I don't have to traverse A at all.

DrTimothyAldenDavis commented 1 year ago

The construction of x vector in LAGraph_Cached_OutDegree takes O(1) time and space, as well.

DrTimothyAldenDavis commented 1 year ago

I revised the computation of the vector W to use this faster method. I also fixed some compiler errors in the JIT (there were syntax errors in the strings). The result is pushed to the dev branch. It works with my latest v8.0.0.draft8.