[BUG] `StackOverflowError` in `Graphs.Experimental.has_isomorph`

tlarock commented 8 months ago

Description of bug May be more of a limitation than a bug, especially given the experimental status of this piece of code.

I am running into StackOverflowErrors when comparing graphs using Graphs.Experimental.has_isomorph. The last call on the stack is vf2check_feasibility.

The error seems to be related to the number of nodes in the graphs (see next section). So I am wondering if there is an expected limitation to the number of nodes (or some other quantity) that this implementation can handle?

How to reproduce The code leading up to the errors in my case is a bit complex, but I have been able to reproduce by running the function on Erdos-Renyi graphs with increasing number of nodes:

using Graphs
using Graphs.Experimental

for num_nodes in [10000, 14000, 14900, 15000, 15100, 20000, 25000, 30000]
    G = erdos_renyi(num_nodes, 3.0 / num_nodes)
    print("N: $num_nodes, edges: $(ne(G)) isomorphic: ")
    println(has_isomorph(G, G))
end

Note that I chose the values around 15k because that is where it is failing in my use case (see additional context). However in the ER case it fails at 30k nodes:

N: 10000, edges: 14916 isomorphic: true
N: 14000, edges: 20935 isomorphic: true
N: 14900, edges: 22292 isomorphic: true
N: 15000, edges: 22581 isomorphic: true
N: 15100, edges: 22497 isomorphic: true
N: 20000, edges: 30024 isomorphic: true
N: 25000, edges: 37280 isomorphic: true
N: 30000, edges: 45185 isomorphic: 

StackOverflowError:

Stacktrace:
 [1] vf2check_feasibility(u::Int64, v::Int64, state::Graphs.Experimental.VF2State{SimpleGraph{Int64}, Int64}, problemtype::IsomorphismProblem, vertex_relation::Nothing, edge_relation::Nothing)
   @ Graphs.Experimental ~/.julia/packages/Graphs/FXxqo/src/Experimental/vf2.jl:85
 [2] vf2match!(state::Graphs.Experimental.VF2State{SimpleGraph{Int64}, Int64}, depth::Int64, callback::Graphs.Experimental.var"#callback#22", problemtype::IsomorphismProblem, vertex_relation::Nothing, edge_relation::Nothing)
   @ Graphs.Experimental ~/.julia/packages/Graphs/FXxqo/src/Experimental/vf2.jl:463

I checked with constant number of edges as well and see the same pattern, so the issue appears to be related to the number of nodes specifically.

Expected behavior I expect the function to return a Bool as normal.

Actual behavior The function does not finish and a StackOverflowError is thrown.

Code demonstrating bug See "How to reproduce"

Version information

output from versioninfo()

Julia Version 1.9.4
Commit 8e5136fa297 (2023-11-14 08:46 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: macOS (arm64-apple-darwin22.4.0)
CPU: 8 × Apple M1 Pro
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, apple-m1)
Threads: 5 on 6 virtual cores
Environment:
JULIA_NUM_THREADS = 4

output from ] status Graphs

Status `~/.julia/environments/v1.9/Project.toml`
[86223c79] Graphs v1.9.0

Additional context As a bit of background, the graphs I am comparing represent pairwise projections of two hypergraphs, basically node co-occurence graphs. It is quite likely that the graphs are indeed isomorphic. I am also using an edge_relation function to match edge weights.

Here are node/edge counts for some passing and failing cases in my code. Note that in the two passing cases (first and last), the number of nodes is smaller than 15k, and in all of the failing cases the number of nodes is larger than 15k. This is different than the 30k found for erdos-renyi, but qualitatively the issue appears to be the same. And since it seems to be exactly half, perhaps it has to do with symmetry, although in both cases I expect the graphs to be undirected.

G1 nodes: 14916; G1 edges: 16219
G2 nodes: 14916; G2 edges: 16219
Passed

G1 nodes: 15471; G1 edges: 16880
G2 nodes: 15471; G2 edges: 16880
Failed

G1 nodes: 16083; G1 edges: 17616
G2 nodes: 16083; G2 edges: 17616
Failed

*fails for 17 in a row, each with nodes > 14916*

G1 nodes: 11324; G1 edges: 11970
G2 nodes: 11324; G2 edges: 11970
Passed

gdalle commented 8 months ago

I checked https://github.com/JuliaGraphs/Graphs.jl/blob/d1081b02c2a2d50ac0e4e95b1f84f17259b719d5/src/Experimental/vf2.jl#L85 but I didn't see a recursive call, so I'm a bit surprised. Can you spot one?

Graph isomorphism is a difficult problem with no known polynomial algorithm. I don't know the details of this implementation but I wouldn't be surprised if it crashed on large graphs. However, for lack of recursion, I would rather expect an OutOfMemoryError or something similar, so this puzzles me

tlarock commented 8 months ago

Thanks for taking a look. Checking again I think actually the stack showing vf2check_feasibility last might be a red herring. The algorithm from the cited paper is recursive and it seems the recursion is implemented in the function vf2match (here for example). vf2check_feasibility is called before each recursive call so it could be a coincidence that it tends to show up last on the stack, or a quirk of StackOverflowError reporting in Julia (total speculation).

simonschoelly commented 8 months ago

The helper function vf2match! is recursive.

vf2 does nothing more than trying out all permutations of vertices with some pruning rules to speed up things. So the general complexity is roughly in the order of O(num_vertices!). And the recursion depth is indeed O(num_vertices) Usually that algorithm would time out much earlier, so a StackOverflow is something one would rarely see. But because you are using the same graph twice, it will find matches v1 ~ w1, v2 ~ w2~, ... right from the start and will indeed exhaust the stack.

I am not sure if we need to do anything here, as the case is indeed very rare - we could convert it to an iterative implementation, but I would suggest, we would only do if we need that for something else like a parallel version of the algorithm, as otherwise it will mostly serve to make the code harder to understand.

What we could do, is to add a quick note to the docstring, saying that it is recursive.

If you indeed wanna try using this algorithm for such large graphs. you could see if there is something in the Julia documentation on increasing the stack size. You could also try to use a smaller eltype than Int for the graphs - I think UInt16 in your case, but I am not sure how much stack memory you will save by this, as most of the allocations are on the heap anyways.

simonschoelly commented 8 months ago

Here they describe a way to set the stack size for a Task, and run the computation in said Task. Perhaps you could also try that? If it works, I wonder, if we could simply wrap this and all other recursive algorithms in tasks, where we estimate the stack size from the input.

tlarock commented 8 months ago

Thanks for this @simonschoelly. After a quick test I can confirm that setting the stack explicitly as in the stackoverflow solution you sent can work, I am able to get output for ER graphs up to 100k nodes.

It would definitely be helpful to add a note to the docstring that highlights the issue and points to this potential solution. And perhaps the docstrings of any other recursive functions that could run into this situation.

The task wrapper solution would be great from the user perspective since it would widen the range of cases for which the function just works. My worry would be that doing this in the background could lead to some rather opaque memory explosion issues based on the "as long as there is enough memory available" parenthetical from the stackoverflow answer. But that is a bit beyond my expertise.

JuliaGraphs / Graphs.jl

[BUG] `StackOverflowError` in `Graphs.Experimental.has_isomorph` #351