Raynes / fs

File system utilities for Clojure.
453 stars 119 forks source link

Use tree-seq to make iterate-dir lazy #105

Open jvoegele opened 7 years ago

jvoegele commented 7 years ago

The previous implementation of iterate-dir was not lazy and would eagerly traverse the entire directory hierarchy and load it all into memory. Because it was eager, the entire directory tree structure would need to be loaded before the iterate-dir function returned. When used on very large directory trees, this could be very slow and could also produce a java.lang.OutOfMemoryError. (See issue #38)

By using tree-seq to lazily traverse the directory tree, the iterate-dir function now immediately returns a lazy sequence (even on very large directory trees) because it does not need to first traverse and load the tree. This also means that iterate-dir will not itself produce an OutOfMemoryError when used on large directory trees. Unfortunately, however, it seems that Clojure's lazy sequence are still susceptible to OutOfMemoryErrors. Even using the lazy tree-seq approach, an OutOfMemoryError can be produced when processing very large directory trees (e.g. with dorun or doseq, or even count):

OutOfMemoryError GC overhead limit exceeded

Nevertheless, this is still an improvement over the previous implementation since the OutOfMemoryError is not produced immediately upon calling iterate-dir, but rather only after processing a very large portion of the results. This seems to be a limitation in Clojure itself, in any case.

jvoegele commented 7 years ago

@Raynes Any thoughts on this?