aplbrain / grandiso-networkx

Performant, pure-Python subgraph isomorphism and monomorphism search (aka "motif search")
Apache License 2.0
52 stars 10 forks source link

Errors on motif with disconnected nodes #43

Open bdpedigo opened 1 month ago

bdpedigo commented 1 month ago

Encountered an error when feeding in some motifs that had isolated nodes.

Code:

import networkx as nx
from grandiso import find_motifs

host = nx.fast_gnp_random_graph(3, 0.5, directed=True)

motif = nx.DiGraph()
motif.add_node(0)
motif.add_node(1)
motif.add_node(2)
motif.add_edge(0, 1)
motif.add_edge(1, 0)

grandiso_motifs = find_motifs(motif, host, count_only=False)

Error:

File ~/code/skedits/skedits-app/skedits/.venv/lib/python3.11/site-packages/grandiso/__init__.py:204, in get_next_backbone_candidates(backbone, motif, host, interestingness, next_node, directed, is_node_structural_match, is_node_attr_match, is_edge_attr_match, isomorphisms_only)
    [201](https://file+.vscode-resource.vscode-cdn.net/Users/ben.pedigo/code/skedits/skedits-app/skedits/~/code/skedits/skedits-app/skedits/.venv/lib/python3.11/site-packages/grandiso/__init__.py:201)             _greatest_backbone_count = motif_backbone_connections_count
    [202](https://file+.vscode-resource.vscode-cdn.net/Users/ben.pedigo/code/skedits/skedits-app/skedits/~/code/skedits/skedits-app/skedits/.venv/lib/python3.11/site-packages/grandiso/__init__.py:202)     # Now we have _node_with_greatest_backbone_count as the best candidate
    [203](https://file+.vscode-resource.vscode-cdn.net/Users/ben.pedigo/code/skedits/skedits-app/skedits/~/code/skedits/skedits-app/skedits/.venv/lib/python3.11/site-packages/grandiso/__init__.py:203)     # for `next_node`.
--> [204](https://file+.vscode-resource.vscode-cdn.net/Users/ben.pedigo/code/skedits/skedits-app/skedits/~/code/skedits/skedits-app/skedits/.venv/lib/python3.11/site-packages/grandiso/__init__.py:204)     next_node = max(
    [205](https://file+.vscode-resource.vscode-cdn.net/Users/ben.pedigo/code/skedits/skedits-app/skedits/~/code/skedits/skedits-app/skedits/.venv/lib/python3.11/site-packages/grandiso/__init__.py:205)         _nodes_with_greatest_backbone_count,
    [206](https://file+.vscode-resource.vscode-cdn.net/Users/ben.pedigo/code/skedits/skedits-app/skedits/~/code/skedits/skedits-app/skedits/.venv/lib/python3.11/site-packages/grandiso/__init__.py:206)         key=lambda node: interestingness.get(node, 0.0),
    [207](https://file+.vscode-resource.vscode-cdn.net/Users/ben.pedigo/code/skedits/skedits-app/skedits/~/code/skedits/skedits-app/skedits/.venv/lib/python3.11/site-packages/grandiso/__init__.py:207)     )
    [209](https://file+.vscode-resource.vscode-cdn.net/Users/ben.pedigo/code/skedits/skedits-app/skedits/~/code/skedits/skedits-app/skedits/.venv/lib/python3.11/site-packages/grandiso/__init__.py:209) # Now we have a node `next_node` which we know is connected to the current
    [210](https://file+.vscode-resource.vscode-cdn.net/Users/ben.pedigo/code/skedits/skedits-app/skedits/~/code/skedits/skedits-app/skedits/.venv/lib/python3.11/site-packages/grandiso/__init__.py:210) # backbone. Get all edges between `next_node` and nodes in the backbone,
    [211](https://file+.vscode-resource.vscode-cdn.net/Users/ben.pedigo/code/skedits/skedits-app/skedits/~/code/skedits/skedits-app/skedits/.venv/lib/python3.11/site-packages/grandiso/__init__.py:211) # and verify that they exist in the host graph:
    [212](https://file+.vscode-resource.vscode-cdn.net/Users/ben.pedigo/code/skedits/skedits-app/skedits/~/code/skedits/skedits-app/skedits/.venv/lib/python3.11/site-packages/grandiso/__init__.py:212) # `required_edges` has the form (prev, self, next), with non-values filled
    [213](https://file+.vscode-resource.vscode-cdn.net/Users/ben.pedigo/code/skedits/skedits-app/skedits/~/code/skedits/skedits-app/skedits/.venv/lib/python3.11/site-packages/grandiso/__init__.py:213) # with None. That way we can easily remember and store the roles of the
    [214](https://file+.vscode-resource.vscode-cdn.net/Users/ben.pedigo/code/skedits/skedits-app/skedits/~/code/skedits/skedits-app/skedits/.venv/lib/python3.11/site-packages/grandiso/__init__.py:214) # node IDs in the next step.
    [215](https://file+.vscode-resource.vscode-cdn.net/Users/ben.pedigo/code/skedits/skedits-app/skedits/~/code/skedits/skedits-app/skedits/.venv/lib/python3.11/site-packages/grandiso/__init__.py:215) required_edges = []

ValueError: max() arg is an empty sequence

Totally understand if you want to punt on how whether/how to execute the search with these disconnected motifs, but some documentation and/or early value erroring around this would be helpful. Took me a while to figure out what I was doing wrong (since the NetworkX will execute for these cases).

j6k4m8 commented 1 month ago

hey @bdpedigo! that's a great point, we should definitely fail early on these.

The obvious reason we don't support multi-component motifs is that there is an exponential cost to enumerating motif matches for each connected component; what would be the sort of behavior you prefer? docs that make this explicit? a check at call-time that raises an exception? it doesn't raise right now because I've been trying to minimize the number of branches in the find_motifs call. but of course, uninterpretable errors = bad :)

j6k4m8 commented 1 month ago

PS: excited that grandiso is useful to you :)

I guess a third option is computing matches for each component and returning them either as a product map (A1+B1, A1+B2, ... An+Bn) or as a list of length n with match lists for each...

bdpedigo commented 1 month ago

PS: excited that grandiso is useful to you :)

Same! 🎉

I feel the simplest answer would be to just throw an error if the input motif is not fully connected. And given that this would be dictating the allowed inputs, a brief line in the documentation that says "input motif must be one connected component" also seems warranted.

In terms of behavior if you were to allow this kind of thing, I would default to whatever networkx is doing currently (since it does at least run):

nx_motifs = list(
    nx.isomorphism.DiGraphMatcher(host, motif).subgraph_monomorphisms_iter()
)

I suspect it in practice it is something like your product map (if I understand you correctly) but I doubt they are doing anything explicit to recognize that this subgraph is disconnected when they compute it. That being said, I have what I need for my application, so no pressure at all here to change the implementation.

j6k4m8 commented 1 month ago

I like that solution — will add shortly :)