The previous implementation of iterate-dir was not lazy and would eagerly traverse the entire directory hierarchy and load it all into memory. Because it was eager, the entire directory tree structure would need to be loaded before the iterate-dir function returned. When used on very large directory trees, this could be very slow and could also produce a java.lang.OutOfMemoryError. (See issue #38)
By using tree-seq to lazily traverse the directory tree, the iterate-dir function now immediately returns a lazy sequence (even on very large directory trees) because it does not need to first traverse and load the tree. This also means that iterate-dir will not itself produce an OutOfMemoryError when used on large directory trees. Unfortunately, however, it seems that Clojure's lazy sequence are still susceptible to OutOfMemoryErrors. Even using the lazy tree-seq approach, an OutOfMemoryError can be produced when processing very large directory trees (e.g. with dorun or doseq, or even count):
OutOfMemoryError GC overhead limit exceeded
Nevertheless, this is still an improvement over the previous implementation since the OutOfMemoryError is not produced immediately upon calling iterate-dir, but rather only after processing a very large portion of the results. This seems to be a limitation in Clojure itself, in any case.
The previous implementation of
iterate-dir
was not lazy and would eagerly traverse the entire directory hierarchy and load it all into memory. Because it was eager, the entire directory tree structure would need to be loaded before theiterate-dir
function returned. When used on very large directory trees, this could be very slow and could also produce ajava.lang.OutOfMemoryError
. (See issue #38)By using
tree-seq
to lazily traverse the directory tree, theiterate-dir
function now immediately returns a lazy sequence (even on very large directory trees) because it does not need to first traverse and load the tree. This also means thatiterate-dir
will not itself produce anOutOfMemoryError
when used on large directory trees. Unfortunately, however, it seems that Clojure's lazy sequence are still susceptible toOutOfMemoryError
s. Even using the lazytree-seq
approach, anOutOfMemoryError
can be produced when processing very large directory trees (e.g. withdorun
ordoseq
, or evencount
):OutOfMemoryError GC overhead limit exceeded
Nevertheless, this is still an improvement over the previous implementation since the
OutOfMemoryError
is not produced immediately upon callingiterate-dir
, but rather only after processing a very large portion of the results. This seems to be a limitation in Clojure itself, in any case.