Add naive/more idiomatic CC algorithm

szarnyasg commented 3 years ago

FastSV uses a number of non-GraphBLAS constructs. As discussed today, we should add a naive implementation (e.g. multiple BFS traversals).

It would also be worth looking for a CC algorithm that can be translated to linear algebra efficiently.

It's a long shot but this recent CC algorithm was formulated in SQL and it may be portable to GraphBLAS: In-database connected component analysis, ICDE'19

johnrgilbert commented 3 years ago

The Fleischer/Hendrickson/Pinar strongly connected components algorithm is pretty simple in GraphBLAS, though not linear work worst-case. (Pick a random vertex; BFS for the reachable vertices both forward and backward from it; partition all the vertices into 4 sets according to whether they're reachable forward, backward, both, or neither; the "both" set is a strong component; do a recursive call for each of the other 3 sets.) Following the excellent GraphBLAS tutorial by Scott McMillan, Tim Mattson, Michel Pelletier, et al. at SIAM CSE last week, I wrote it in a couple dozen lines of reasonably idiomatic pygraphblas code.

Cheers,

John

On Mon, Mar 8, 2021 at 11:45 AM Gabor Szarnyas notifications@github.com wrote:

FastSV uses a number of non-GraphBLAS constructs. As discussed today, we should add a naive implementation (e.g. multiple BFS traversals).

It would also be worth looking for a CC algorithm that can be translated to linear algebra efficiently.

It's a long shot but this recent CC algorithm was formulated in SQL and it may be portable to GraphBLAS: In-database connected component analysis https://arxiv.org/pdf/1802.09478.pdf, ICDE'19

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/GraphBLAS/LAGraph/issues/109, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGEGDCC3Y3QJ4ID75P63HGLTCUSPDANCNFSM4Y2AR3DQ .

mcmillan03 commented 2 years ago

Before we can release v1.0, we need to demote FastSV to experimental code and replace it with something idiomatic. There is at least one replacement option in experimental (LACC?). Would like to see John G's pygraphblas code.

johnrgilbert commented 2 years ago

Here attached is a Jupyter notebook with my pygraphblas code for the Fleischer/Hendrickson/Pinar strongly connected components algorithm. It was just sort of dashed off during the pygraphblas tutorial at SIAM CSE earlier this year. scc() is the main routine; label() actually does the work. components() is a separate undirected connected components routine.

Cheers,

John

John R. Gilbert Professor of Computer Science University of California Santa Barbara, CA 93106-5110

On Tue, Nov 16, 2021 at 12:09 PM Doc McMillan @.***> wrote:

Before we can release v1.0, we need to demote FastSV to experimental code and replace it with something idiomatic. There is at least one replacement option in experimental (LACC?). Would like to see John G's pygraphblas code.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/GraphBLAS/LAGraph/issues/109#issuecomment-970638503, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGEGDCELZ6ZXEEOBEQJ4HVDUMK26BANCNFSM4Y2AR3DQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

mcmillan03 commented 2 years ago

import pygraphblas as grb
from pygraphblas.gviz import draw_graph as draw
import random

row_ind = [0, 0, 0, 1, 1, 2, 3, 3, 4, 4, 5, 5, 6, 7, 7, 8, 8, 8]
col_ind = [1, 4, 8, 0, 8, 6, 5, 7, 0, 8, 3, 7, 2, 3, 5, 0, 1, 4]
values  = [1, 2, 1, 1, 3, 4, 1, 2, 2, 5, 1, 2, 4, 2, 2, 1, 3, 5]
G = grb.Matrix.from_lists(row_ind,col_ind,values)

def reachable(vtx, graph):
    """Return grb_BOOL vector of reachable vertices from vertex"""
    with grb.BOOL.LOR_LAND:
        n = graph.ncols
        reached = grb.Vector.sparse(grb.types.BOOL, n)
        front = reached.dup()
        front[vtx] = True
        step = 1
        while front.nvals > 0:
            step += 1
            front = graph.mxv(front, desc=grb.descriptor.CT0, mask=reached)
            reached += front
    return reached

def components(graph):
    """grb_INT64 vector of component labels for undirected graph"""
    n = graph.ncols
    component = grb.Vector.sparse(grb.types.INT64, n)
    ncomps = 0
    for v in range(n):
        if component.get(v) is not None:
            continue
        r = reachable(v, graph)
        component[r] = ncomps
        ncomps += 1
    print('ncomps:', ncomps)
    return component

def label(V, G, Gt):
    """label the components of the subgraph of G induced by vertices V"""
    components = grb.Vector.sparse(grb.types.INT64, G.ncols)
    if V.nvals > 0: 

        # search from a random vertex for the 4 reachability classes:
        # A can reach v and vice versa (A is v's strong component)
        # B can reach v but not vice versa
        # C can be reached from v but not vice versa
        # D cannot be reached from v nor vice versa
        v = random.choice(list(V.indexes))
        B = search(v, Gt, V)
        C = search(v, G, V)
        A = B * C
        A = A.select('==',True)
        B -= A
        B = B.select('==',True)
        C -= A
        C = C.select('==',True)
        D = V - A - B - C
        D = D.select('==',True)

        # label vertices in the new component A
        components[A] = 1

        # recursively label vertices in the other reachability classes
        newcomps = label(B, G, Gt) 
        components[B] = newcomps[B] + components.max()
        newcomps = label(C, G, Gt) 
        components[C] = newcomps[C] + components.max()
        newcomps = label(D, G, Gt) 
        components[D] = newcomps[D] + components.max()    
    return components

def search(vtx, graph, subgraph):
    """Return grb_BOOL vector of reachable vertices in induced subgraph"""
    assert subgraph.get(vtx)
    reached = grb.Vector.sparse(grb.types.BOOL, graph.ncols)
    front = reached.dup()
    front[vtx] = True
    while front.nvals > 0:
        reached += front
        front = graph.mxv(front, desc=grb.descriptor.CT0, mask=reached)
        front = front * subgraph
    return reached

def scc(G):
    """grb_INT64 vector of strong component labels for directed graph"""
    with grb.BOOL.LOR_LAND:
        Gt = G.T
        V = grb.Vector.sparse(grb.types.BOOL, G.ncols)
        V[:] = True
        components = label(V, G, Gt)
    return components

rlist = [0,0,1,1,2,3,3,4,5,6,6]
clist = [1,3,4,6,5,2,0,5,2,4,3]
vlist = [True]*len(rlist)
A = grb.Matrix.from_lists(rlist,clist,vlist)
print(scc(A.T))

DrTimothyAldenDavis commented 2 years ago

I have a pure GrB revision of the cc_boruvka algoritm, now called LG_CC_Boruvka. It's not as fast as the FastSV method. It can spend lots of time in the call to GrB_select. But it's working now, and it's pure GrB with the v2.0 C API.

DrTimothyAldenDavis commented 2 years ago

FastSV6 is updated. I replaced the non-GraphBLAS Reduce_assign with a pure GraphBLAS one. It does use GxB extensions, but only my GxB pack/unpack move constructors. It uses GxB pack/unpack in a clever way to convert an integer array into a matrix C, and back again, in O(1) time and no extract space. There are no parallel for loops in the LAGraph code for this part any longer. The new method is faster, but it is slowed down by the simultaneous change from 32 bit to 64 bit integers.

There are 2 other parts left that do work outside of GraphBLAS:

(1) a matrix T is computed where T(i,:) contains the first 4 (at most 4) entries in A(i,:), selected from the entries with smallest column index. Neither GrB nor GxB _select can do this. I can add this to GxB_select.

(2) another matrix T is computed (unrelated to the first but using the same name), where T is equal to A but with edges within (or to) the largest component deleted, and any row A(i,:) with one or more entries A(i,j) with parent(j) = key are replaced with a single entry A(i,key). The "key" is the representative node of the largest (as yet known)connected component. This can be done with GrB_select and GrB_assign.

Once these changes are made, LG_CC_FastSV6 will be suitable to keep in LAGraph, but only when SuiteSparse is in use.

For pure GrB, we now have an updated LG_CC_Boruvka method. It's slower that FastSV6, sometimes significantly so.

All of these methods are computing connected components of an undirected graph (or a directed graph with a symmetric adjacency matrix structure). They don't compute the strongly connected components, which is what John's code does above.

DrTimothyAldenDavis commented 2 years ago

I benchmarked Boruvka and LACC and found that Boruvka is currently faster, at least with SuiteSparse:GraphBLAS v6.0.1. Note that v6.0.0 has a bug (gasp!) that causes Boruvka to fail for the roadNet-CA problem. This is fixed in v6.0.1, not yet posted but on my master branch: https://github.com/DrTimothyAldenDavis/GraphBLAS/releases/tag/v6.0.1.alpha1

DrTimothyAldenDavis commented 2 years ago

The Boruvka method fills the role of the pure-GrB Connected components method.

GraphBLAS / LAGraph

Add naive/more idiomatic CC algorithm #109