ImperialCollegeLondon / SWMManywhere

SWMManywhere is used to derive and simulate a sewer network anywhere in the world
https://imperialcollegelondon.github.io/SWMManywhere/
BSD 3-Clause "New" or "Revised" License
4 stars 1 forks source link

Using Leiden instead of Louvain for community detection #319

Open cheginit opened 2 days ago

cheginit commented 2 days ago

While working on a project, I found this package called leidenalg. Based on my experience, it produces higher quality communities than Networkx's Louvain and tends to be faster. Here's an example of how it can be used to generate communities from a networkx graph object:

import networkx as nx
import igraph as ig
import leidenalg as la
import cytoolz.curried as tlz

def _create_subgraph(
    graph: nx.DiGraph,
    nodes: Iterable[int],
    name: str | int | None = None,
    largest: bool = False,
) -> nx.DiGraph:
    """Create a subgraph from a networkx.DiGraph and list of nodes."""
    sub_graph = graph.__class__()
    sub_graph.add_nodes_from((n, graph.nodes[n]) for n in nodes)
    if sub_graph.is_multigraph():
        sub_graph.add_edges_from(
            (n, nbr, key, d)
            for n, nbrs in graph.adj.items()
            if n in nodes
            for nbr, keydict in nbrs.items()
            if nbr in nodes
            for key, d in keydict.items()
        )
    else:
        sub_graph.add_edges_from(
            (n, nbr, d)
            for n, nbrs in graph.adj.items()
            if n in nodes
            for nbr, d in nbrs.items()
            if nbr in nodes
        )
    sub_graph.graph.update(graph.graph)  # pyright: ignore[reportGeneralTypeIssues]
    if name is not None:
        nx.function.set_edge_attributes(sub_graph, name, "community")
    if largest:
        if graph.is_directed():
            conn = max(nx.weakly_connected_components(sub_graph), key=len)
        else:
            conn = max(nx.connected_components(sub_graph), key=len)
        sub_graph = _create_subgraph(sub_graph, conn, name, largest=False)
    return nx.algorithms.tree.minimum_spanning_tree(sub_graph.to_undirected()).to_directed()

def communities(graph: nx.DiGraph, resolution: float = 1e-5) -> list[nx.DiGraph]:
    """Get communities from a street network.

    Parameters
    ----------
    graph : nx.DiGraph
        The street network.
    resolution : float, optional
        A higher resolution favors detecting smaller, more granular
        communities, while a lower resolution tends to merge smaller
        communities into larger ones. Defaults to ``1e-5``.

    Returns
    -------
    list of networkx.DiGraph
        The communities as a list of ``networkx.DiGraph``.
    """
    graph_ig = ig.Graph.from_networkx(graph.to_undirected())
    graph_ig.vs["community"] = la.find_partition(
        graph_ig,
        la.CPMVertexPartition,
        seed=42,
        resolution_parameter=resolution,
    ).membership
    gnx = graph_ig.to_networkx()
    comm = nx.get_node_attributes(gnx, "community")
    nodes_nx = tlz.merge_with(list, ({p: i} for i, p in comm.items()))
    return [
        _create_subgraph(graph, c, i, largest=True)
        for i, c in nodes_nx.items()  # pyright: ignore[reportArgumentType]
    ]
barneydobson commented 2 days ago

Awesome - I've noted it in #227 which is the overview of subbasin delineation/community detection issue.

szhorvat commented 2 days ago

This algorithm is also included in igraph directly: https://python.igraph.org/en/stable/api/igraph.Graph.html#community_leiden

The igraph version is faster but less flexible (fewer objective functions, no support for directed graphs) than the leidenalg version.

barneydobson commented 2 days ago

Thanks @szhorvat - igraph looks super helpful, as ultimately, I would envisage offering a range of options for a user to further customise the communities. I think it is fine to have just modularity objective and use of undirected graphs in this case - indeed that is how this community detection step currently works. The only point at which swmmanywhere requires a directed graph is for shortest path optimization of the pipe network topology, since slope is a directed variable. But community detection in this step is more focussed on eliminating or retaining potential pipe-carrying links based on the hydrological subbasins and the road network communities (more info in #227 ).