Add bidirectional and multi-source BFS algorithms

szarnyasg commented 4 years ago

I'd like to add bidirectional and MSBFS algorithms. Some design considerations for these:

For MSBFS, we can either use boolean matrices or bitwise operations. The latter as described in a VLDB paper, which states that compared to a textbook BFS (T-BFS) and a direction-optimizing BFS (DO-BFS), it provides an ~80x speedup

As an example, T-BFS and DO-BFS take 135 minutes and 22 minutes, respectively, to process the LDBC graph with 1 million vertices, while MS-BFS takes only 1.75 minutes. MS-BFS shows excellent scalability for all presented graphs
For both bidirectional BSF and MSBFS, it makes sense to experiment with push/pull.
Note however that for MSBFS, the experiments in the VLDB paper put the performance gains of push/pull to max. 30%:

Our experiments show that with both this hybrid approach and ANP, MS-BFS can significantly reduce the number of random accesses to visit and visitNext, improving the performance by up to 30%
Typically, the MSBFS algorithm needs to perform some operation when it finds a node. To incorporate this in our algorithm, it probably makes sense to add a function pointer argument to its method so that users can pass the desired operations.

DrTimothyAldenDavis commented 4 years ago

Using bitwise operators will probably be the fastest method. From the paper, it looks like there is a useful bitwise binary operator that could be added as a built-in: bitdiff(x,y) which computes x & ~y. For boolean, this operator already exists with the name GrB_GT_BOOL, computing x > y, which is the same as (x && !y). It also looks like this method could use a popcount unary function as well.

Regarding a function pointer for doing some work when a node is found: Is this a function pointer you would add to LAGraph, so that after each level of the BFS, LAGraph could call the function? Or would this be implemented as a unary operator passed to GrB_apply? (or a binary operator and a scalar)? Or perhaps a callback to a higher-level function, where LAGraph would call a user function once, with a list (say in a GrB_Vector, or boolean/bitwise/etc) of nodes seen at a given level, and then the user function would do whatever it wants?

szarnyasg commented 4 years ago

Thanks, I wasn't aware of GrB_GT_BOOL. For the boolean case, it's probably best to use a complemented mask by computing Next<!Seen> = Frontier ANY.PAIR A. For the bitwise case, we currently use apply with BNOT (suggested by @marci543):

LAGr_apply(Next, Next, GrB_BAND_UINT64, GrB_BNOT_UINT64, Seen, NULL)

Regarding a function pointer for doing some work when a node is found: Is this a function pointer you would add to LAGraph, so that after each level of the BFS, LAGraph could call the function? [...] Or perhaps a callback to a higher-level function, where LAGraph would call a user function once, with a list (say in a GrB_Vector, or boolean/bitwise/etc) of nodes seen at a given level, and then the user function would do whatever it wants?

In our current problem, where we implement exact closeness centrality value, whenever a row is reached by any of the BFS traversals (i.e. sum popcount(A[i, :]) > 0 is satisfied), its s(i) value should be increased by the level * sum popcount(A[i, :]). To achieve maximum efficiency in this case, it would be best to call a user function once, pass it the current state of the matrix and vector s, and let it compute s(i) += level * sum popcount(A[i, :]) on vector s.

GraphBLAS / LAGraph

Add bidirectional and multi-source BFS algorithms #97