`(scc ...)` dies with a StackOverflow on large directed graphs

aysylu / loom

Graph library for Clojure. Mailing list https://groups.google.com/forum/#!forum/loom-clj

http://aysy.lu/loom/

887 stars 108 forks source link

`(scc ...)` dies with a StackOverflow on large directed graphs #5

Closed jszakmeister closed 10 years ago

jszakmeister commented 11 years ago

I haven't tracked down exactly where the issue is yet. It appears there is a recursive call somewhere that is not in a tail recursive form. This was the backtrace (with Clojure 1.5.1 and Loom 0.3.1), but it's not very helpful:

Exception in thread "main" java.lang.StackOverflowError
    at clojure.lang.ASeq.more(ASeq.java:130)
    at clojure.lang.RT.more(RT.java:607)
    at clojure.core$rest.invoke(core.clj:73)
    at clojure.core$filter$fn__4226.invoke(core.clj:2532)
    at clojure.lang.LazySeq.sval(LazySeq.java:42)
    at clojure.lang.LazySeq.seq(LazySeq.java:60)
    at clojure.lang.RT.seq(RT.java:484)
    at clojure.core$seq.invoke(core.clj:133)
    at clojure.core$filter$fn__4226.invoke(core.clj:2523)
    at clojure.lang.LazySeq.sval(LazySeq.java:42)
    at clojure.lang.LazySeq.seq(LazySeq.java:60)
    at clojure.lang.RT.seq(RT.java:484)
    at clojure.core$seq.invoke(core.clj:133)
    at clojure.core$filter$fn__4226.invoke(core.clj:2523)
        [elided... this repeats on and on]

jszakmeister commented 11 years ago

So I tracked this down further. The call to remove in scc seems to be the culprit. It keeps building lazy sequence on top of lazy sequence to filter out the nodes, and eventually reaches a point that overflows the stack. Perhaps using an ordered set would be better here?

aysylu commented 11 years ago

Could you please share the graph you're trying this on? Alternatively, could you try to see if the following fixes your problem:

(recur (doall (remove seen stack))
                 seen
                 (conj cc c))

on the following lines https://github.com/aysylu/loom/blob/master/src/loom/alg.clj#L312...L314?

Note the invocation of doall on the remove function.

jszakmeister commented 11 years ago

I did try that after reporting the issue, but it turned out to be only part of the problem (sorry for not coming back and reporting that!).

I needed to make the change you recommended, and also change a line in traverse-all:

          [(into seen ctrav) (doall (concat trav ctrav))])))

(notice the doall around the concat).

Between the two changes, it ran to completion but extremely slow. It took 4 hrs and 45 minutes to complete, while the same algorithm with a rudimentary implementation in Python ran in 1 minute and 20 seconds on the graph of 200,000 vertices and 200,000 edges.

I put the edge list for the graph here: https://gist.github.com/jszakmeister/7068034

It shouldn't be hard to read it in and build the digraph from it.

aysylu commented 11 years ago

Thanks for the follow-up and the test graph! It sounds like it's time for refactoring the algorithm for more optimized data structures. If you end up optimizing the function, your pull request will be very welcome.

jszakmeister commented 11 years ago

Thanks @aysylu. I hope to get some time to spend looking at this in the next couple of weeks. If I figure something out, I'll be sure to send you a pull request.

BTW, how do you feel about introducing another dependency? I'm looking at possibly using https://github.com/flatland/ordered, which has an implementation of ordered sets. I think that'll be useful for solving the stack overflow issue in a more performant manner, but I want to make sure you're okay with that first.

Thanks!

aysylu commented 11 years ago

Given the choice of having no other dependencies but Clojure and slow implementation of SCC or reasonably fast implementation of SCC and an extra dependency, I think it's fine to introduce another dependency. With the eventual goal of Loom becoming a Clojure contrib library, this will affect what we can include in the contrib version, but we can figure it out later.

Thank you, @jszakmeister!

gdevanla commented 10 years ago

@aysylu @jszakmeister I tried to reproduce this error. I see that the computation never completes for post-traverse and pre-traverse as well. I hope to work on this issue this week, if all goes well.