clj-commons / byte-streams

A Rosetta stone for JVM byte representations
417 stars 33 forks source link

Large file to byte arrays cause insufficient memory (error=12) #70

Closed zenfey closed 8 months ago

zenfey commented 8 months ago

to recur this error: falloate -l 2048M /tmp/tmp.bin, then in repl, (def v (bs/to-byte-arrays (clojure.java.io/file "/tmp/tmp.bin") {:chunk-size 64})), the insufficient memory reports after running some codes consuming v, for example (count v)

KingMob commented 8 months ago

It's because to-byte-arrays returns a seq, and count will try to realize the entire seq in memory. Seqs can be lazy, so you can't universally know their length without walking them, and since you're holding on to the head with v, none of it can be garbage-collected.

What are you trying to do?

KingMob commented 8 months ago

If you want a file as a Manifold stream, maybe something like this will help:


(def chunk-size "Measured in bytes" 4096)

(defn file-channel
  ^FileChannel
  [f]
  (-> f
      (fs/path)
      (FileChannel/open (into-array StandardOpenOption [StandardOpenOption/READ]))))

(defn file-stream
  "Given a file, returns a buffered Manifold stream of ByteBuffers of the contents"
  [^File f]
  (let [fc (file-channel f)
        f-stream (bs/convert fc (bs/stream-of ByteBuffer) {:chunk-size chunk-size})]
    #_(s/on-drained f-stream #(log/debug :msg "file-stream drained"
                                         :filename (.getName f)))
    (->> f-stream
         (s/buffer default-buf-size)
         s/source-only)))