brandonbloom / fipp

Fast Idiomatic Pretty Printer for Clojure
525 stars 44 forks source link

Fix some seq retention issues #32

Closed cichli closed 4 years ago

cichli commented 8 years ago

I first discovered that

(binding [*print-length* 10]
  (fipp.edn/pprint (range)))

would never return, because we were comparing print-length to (count xs) - which is no good on an infinite sequence. Switched to using (seq? (drop print-length xs)), and added extra tests to ensure a) this works for infinite sequences and b) a potential off-by-one error has not been made.

Further poking around showed that the following would cause heap exhaustion:

(binding [*out* (proxy [java.io.Writer] []
                  (write [_])
                  (flush [])
                  (close []))]
  (fipp.edn/pprint (range 100000000)))

There were two places causing memory to leak:

brandonbloom commented 8 years ago

Solid detective work. A few questions/comments coming inline...

brandonbloom commented 8 years ago

Maybe I'm missing something, but I'm seeing heap exhaustion for that even with both of these commits applied. Are you sure that applying these make that (range 100000000) test case work?

Thinking about this a bit more: Is there even a use case for pretty printing extremely large unretained lazy seqs? It seems like most pretty-printing needs would proceed from some root value that will retain the entire tree of values being printed in the calling code anyway. Do you have a genuine need for these changes?

cichli commented 8 years ago

I'm definitely not seeing heap exhaustion for that example, even if I change the object being printed to something that you'd expect to root the sequence like {:a {:b {:c (range 100000000)}}} or (atom (range 100000000)). The same applies if I lift the object being printed into a local binding with let - isn't this just normal Clojure locals clearing ensuring that nothing retains onto the object if it doesn't need to?

The way I'm testing this is attaching to the process with YourKit and watching the heap usage graph. Forcing a manual GC will always take the heap usage back to the same baseline.

I asked a colleague (@bronsa) to check and he's seeing the same results as me. We're both using Java 8 FWIW - are you using Java 7? There were some escape analysis improvements in HotSpot 8 that may be responsible for this.

Thinking about this a bit more: Is there even a use case for pretty printing extremely large unretained lazy seqs?

I'd say that being able to print potentially larger-than-memory lazy data structures is definitely useful, for example:

(require '[clojure.java.io :as io])
(require '[fipp.edn :as edn])

(binding [*out* (io/writer (io/file "/tmp/foo.edn"))]
  (edn/pprint (... some DB query that returns a larger-than-memory lazy seq ...)))

Granted, in this case it would be possible to (run! edn/pprint xs), but that only solves the case where the root thing I'm printing is an unrealised lazy sequence - but not if it's e.g. a map that contains an unrealised lazy sequence as one of its values (which by my testing above, will not be retained in memory).

brandonbloom commented 8 years ago

Thanks for the additional details. Given the testing complexity and unproven impact here, I probably won't really dig in to this for a while, just in the interest of my own time availability. If memory exhaustion becomes a regular problem in practice, please let me know and I'll find some time to tool around with YourKit.

brandonbloom commented 4 years ago

Sorry I forgot about this PR. The main fix for the (count infinite-seq) issue got incorporated by f0c6ab05 a while ago.

I couldn't reproduce the heap exhaustion issues with up-to-date Clojure and JVM. Please shout if this is still a problem for you in a production context.