Closed szarnyasg closed 2 years ago
The Fleischer/Hendrickson/Pinar strongly connected components algorithm is pretty simple in GraphBLAS, though not linear work worst-case. (Pick a random vertex; BFS for the reachable vertices both forward and backward from it; partition all the vertices into 4 sets according to whether they're reachable forward, backward, both, or neither; the "both" set is a strong component; do a recursive call for each of the other 3 sets.) Following the excellent GraphBLAS tutorial by Scott McMillan, Tim Mattson, Michel Pelletier, et al. at SIAM CSE last week, I wrote it in a couple dozen lines of reasonably idiomatic pygraphblas code.
Cheers,
On Mon, Mar 8, 2021 at 11:45 AM Gabor Szarnyas notifications@github.com wrote:
FastSV uses a number of non-GraphBLAS constructs. As discussed today, we should add a naive implementation (e.g. multiple BFS traversals).
It would also be worth looking for a CC algorithm that can be translated to linear algebra efficiently.
It's a long shot but this recent CC algorithm was formulated in SQL and it may be portable to GraphBLAS: In-database connected component analysis https://arxiv.org/pdf/1802.09478.pdf, ICDE'19
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/GraphBLAS/LAGraph/issues/109, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGEGDCC3Y3QJ4ID75P63HGLTCUSPDANCNFSM4Y2AR3DQ .
Before we can release v1.0, we need to demote FastSV to experimental code and replace it with something idiomatic. There is at least one replacement option in experimental (LACC?). Would like to see John G's pygraphblas code.
Here attached is a Jupyter notebook with my pygraphblas code for the Fleischer/Hendrickson/Pinar strongly connected components algorithm. It was just sort of dashed off during the pygraphblas tutorial at SIAM CSE earlier this year. scc() is the main routine; label() actually does the work. components() is a separate undirected connected components routine.
Cheers,
John R. Gilbert Professor of Computer Science University of California Santa Barbara, CA 93106-5110
On Tue, Nov 16, 2021 at 12:09 PM Doc McMillan @.***> wrote:
Before we can release v1.0, we need to demote FastSV to experimental code and replace it with something idiomatic. There is at least one replacement option in experimental (LACC?). Would like to see John G's pygraphblas code.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/GraphBLAS/LAGraph/issues/109#issuecomment-970638503, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGEGDCELZ6ZXEEOBEQJ4HVDUMK26BANCNFSM4Y2AR3DQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
import pygraphblas as grb
from pygraphblas.gviz import draw_graph as draw
import random
row_ind = [0, 0, 0, 1, 1, 2, 3, 3, 4, 4, 5, 5, 6, 7, 7, 8, 8, 8]
col_ind = [1, 4, 8, 0, 8, 6, 5, 7, 0, 8, 3, 7, 2, 3, 5, 0, 1, 4]
values = [1, 2, 1, 1, 3, 4, 1, 2, 2, 5, 1, 2, 4, 2, 2, 1, 3, 5]
G = grb.Matrix.from_lists(row_ind,col_ind,values)
def reachable(vtx, graph):
"""Return grb_BOOL vector of reachable vertices from vertex"""
with grb.BOOL.LOR_LAND:
n = graph.ncols
reached = grb.Vector.sparse(grb.types.BOOL, n)
front = reached.dup()
front[vtx] = True
step = 1
while front.nvals > 0:
step += 1
front = graph.mxv(front, desc=grb.descriptor.CT0, mask=reached)
reached += front
return reached
def components(graph):
"""grb_INT64 vector of component labels for undirected graph"""
n = graph.ncols
component = grb.Vector.sparse(grb.types.INT64, n)
ncomps = 0
for v in range(n):
if component.get(v) is not None:
continue
r = reachable(v, graph)
component[r] = ncomps
ncomps += 1
print('ncomps:', ncomps)
return component
def label(V, G, Gt):
"""label the components of the subgraph of G induced by vertices V"""
components = grb.Vector.sparse(grb.types.INT64, G.ncols)
if V.nvals > 0:
# search from a random vertex for the 4 reachability classes:
# A can reach v and vice versa (A is v's strong component)
# B can reach v but not vice versa
# C can be reached from v but not vice versa
# D cannot be reached from v nor vice versa
v = random.choice(list(V.indexes))
B = search(v, Gt, V)
C = search(v, G, V)
A = B * C
A = A.select('==',True)
B -= A
B = B.select('==',True)
C -= A
C = C.select('==',True)
D = V - A - B - C
D = D.select('==',True)
# label vertices in the new component A
components[A] = 1
# recursively label vertices in the other reachability classes
newcomps = label(B, G, Gt)
components[B] = newcomps[B] + components.max()
newcomps = label(C, G, Gt)
components[C] = newcomps[C] + components.max()
newcomps = label(D, G, Gt)
components[D] = newcomps[D] + components.max()
return components
def search(vtx, graph, subgraph):
"""Return grb_BOOL vector of reachable vertices in induced subgraph"""
assert subgraph.get(vtx)
reached = grb.Vector.sparse(grb.types.BOOL, graph.ncols)
front = reached.dup()
front[vtx] = True
while front.nvals > 0:
reached += front
front = graph.mxv(front, desc=grb.descriptor.CT0, mask=reached)
front = front * subgraph
return reached
def scc(G):
"""grb_INT64 vector of strong component labels for directed graph"""
with grb.BOOL.LOR_LAND:
Gt = G.T
V = grb.Vector.sparse(grb.types.BOOL, G.ncols)
V[:] = True
components = label(V, G, Gt)
return components
rlist = [0,0,1,1,2,3,3,4,5,6,6]
clist = [1,3,4,6,5,2,0,5,2,4,3]
vlist = [True]*len(rlist)
A = grb.Matrix.from_lists(rlist,clist,vlist)
print(scc(A.T))
I have a pure GrB revision of the cc_boruvka algoritm, now called LG_CC_Boruvka. It's not as fast as the FastSV method. It can spend lots of time in the call to GrB_select. But it's working now, and it's pure GrB with the v2.0 C API.
FastSV6 is updated. I replaced the non-GraphBLAS Reduce_assign with a pure GraphBLAS one. It does use GxB extensions, but only my GxB pack/unpack move constructors. It uses GxB pack/unpack in a clever way to convert an integer array into a matrix C, and back again, in O(1) time and no extract space. There are no parallel for loops in the LAGraph code for this part any longer. The new method is faster, but it is slowed down by the simultaneous change from 32 bit to 64 bit integers.
There are 2 other parts left that do work outside of GraphBLAS:
(1) a matrix T is computed where T(i,:) contains the first 4 (at most 4) entries in A(i,:), selected from the entries with smallest column index. Neither GrB nor GxB _select can do this. I can add this to GxB_select.
(2) another matrix T is computed (unrelated to the first but using the same name), where T is equal to A but with edges within (or to) the largest component deleted, and any row A(i,:) with one or more entries A(i,j) with parent(j) = key are replaced with a single entry A(i,key). The "key" is the representative node of the largest (as yet known)connected component. This can be done with GrB_select and GrB_assign.
Once these changes are made, LG_CC_FastSV6 will be suitable to keep in LAGraph, but only when SuiteSparse is in use.
For pure GrB, we now have an updated LG_CC_Boruvka method. It's slower that FastSV6, sometimes significantly so.
All of these methods are computing connected components of an undirected graph (or a directed graph with a symmetric adjacency matrix structure). They don't compute the strongly connected components, which is what John's code does above.
I benchmarked Boruvka and LACC and found that Boruvka is currently faster, at least with SuiteSparse:GraphBLAS v6.0.1. Note that v6.0.0 has a bug (gasp!) that causes Boruvka to fail for the roadNet-CA problem. This is fixed in v6.0.1, not yet posted but on my master branch: https://github.com/DrTimothyAldenDavis/GraphBLAS/releases/tag/v6.0.1.alpha1
The Boruvka method fills the role of the pure-GrB Connected components method.
FastSV uses a number of non-GraphBLAS constructs. As discussed today, we should add a naive implementation (e.g. multiple BFS traversals).
It would also be worth looking for a CC algorithm that can be translated to linear algebra efficiently.
It's a long shot but this recent CC algorithm was formulated in SQL and it may be portable to GraphBLAS: In-database connected component analysis, ICDE'19