This PR adds the table function weakly_connected_component.
from weakly_connected_component(snb, person, knows);
The table function returns two columns, the primary key of the source table and the componentId. The componentId is equal to the smallest rowid of the connected component.
The function works on an undirected graph, traversed using a CSR data structure and MS-BFS until exhaustion. Then, all the nodes that are reachable from a source are assigned the lowest ID (rowid) in the connected component.
Timings
Reported numbers are the mean runtime over 5 runs only measuring the weakly connected component function. Excluding data loading. Measured on a 2021 Macbook Pro M1.
Scale Factor
DuckPGQ Timing
NetworkX Timing
Speedup (NetworkX / DuckPGQ)
Neo4j Timing
Speedup (Neo4j / DuckPGQ)
SF1
0.093s
0.022s
0.24x
0.200s
2.15x
SF3
0.227s
0.074s
0.33x
0.349s
1.54x
SF10
0.768s
0.370s
0.48x
0.970s
1.26x
DuckPGQ
-CREATE PROPERTY GRAPH snb
VERTEX TABLES (
Person
)
EDGE TABLES (
Person_knows_person SOURCE KEY (Person1Id) REFERENCES Person (id)
DESTINATION KEY (Person2Id) REFERENCES Person (id)
LABEL Knows);
from weakly_connected_component(snb, person, knows);
NetworkX
def calculate_cc(G: nx.Graph) -> pd.DataFrame:
# works on undirected graph
connected_components = [c for c in sorted(nx.connected_components(G), key=len, reverse=True)]
# Prepare lists to hold the IDs and their corresponding componentId
ids = []
component_ids = []
for component in connected_components:
# Determine the componentId as the minimum node ID in the component
component_id = min(component)
# Add each node in the component to the lists
for node in component:
ids.append(int(node))
component_ids.append(int(component_id))
# Create a DataFrame from the lists
wcc_df = pd.DataFrame({
'id': ids,
'componentId': component_ids
})
return wcc_df
Neo4j
CALL gds.graph.project(
'myGraph',
'Person',
'KNOWS'
)
CALL gds.wcc.stream('myGraph')
YIELD nodeId, componentId
RETURN gds.util.asNode(nodeId).id AS id, componentId
ORDER BY componentId, id
Fixes #138
This PR adds the table function
weakly_connected_component
.The table function returns two columns, the
primary key
of the source table and thecomponentId
. ThecomponentId
is equal to the smallest rowid of the connected component.The function works on an undirected graph, traversed using a CSR data structure and MS-BFS until exhaustion. Then, all the nodes that are reachable from a source are assigned the lowest ID (rowid) in the connected component.
Timings
Reported numbers are the mean runtime over 5 runs only measuring the weakly connected component function. Excluding data loading. Measured on a 2021 Macbook Pro M1.
DuckPGQ
NetworkX
Neo4j