Possible issue with topological_sort_by_dfs

Looks like the function topological_sort_by_dfs can have a run-time $O(n\cdot n)$ even for sparse graphs. The simplest example to get this behaviour is

function gen_graph_topsort(k)
    g = SimpleDiGraph(k+1)
    for i = 1:k
        add_edge!(g, 1, i+1)
    end
    return g
end

g = gen_graph_topsort(10^5)
@time topological_sort_by_dfs(g)

For graphs with about 10^5 vertices this already takes a few seconds. For larger graphs, this quickly becomes infeasible (due to the quadratic run-time). Expected behaviour would be that the implementation has worst-case O(n+m) run-time and takes fractions of a second for graphs of this size.

The problem appears to be that this loop https://github.com/JuliaGraphs/Graphs.jl/blob/e5405d105e034ccb9396d30f86c26641ce94c504/src/traversals/dfs.jl#L97 goes through the neighbors of u over and over everytime starting from the beginning of the list.

I.e., it will find n with vcolor[n] == 0 and break, then (with some further steps) vcolor[n] will be set to 2. Afterwards, again the search over the neighbors of u starts at the first neighbor and goes through them until it finds one with vcolor == 0 and breaks etc.

Thus, for a vertex with $k$ neighbor it can take $O(k^2)$ steps, leading to potentially O(n^2), resp. O(m^2), run-time for sparse graphs.

JuliaGraphs / Graphs.jl

Possible issue with topological_sort_by_dfs #263