clj-commons / byte-streams

A Rosetta stone for JVM byte representations
417 stars 33 forks source link

`closeable-seq` may end prematurely after GC? #37

Closed jerrypnz closed 2 years ago

jerrypnz commented 5 years ago

We have some code that reads an InputStream into chunks of fixed size ByteBuffers using (convert s (seq-of ByteBuffer)). When testing the code, it's found that the returned seq doesn't always have all the data when the input stream is relatively big.

We were able to reproduce it using the following code snippet:

user> (count (convert (io/input-stream (io/file "bigfile")) (seq-of ByteBuffer)))
;; => 26955
user> (count (convert (io/input-stream (io/file "bigfile")) (seq-of ByteBuffer)))
;; => 79920
user> (count (convert (io/input-stream (io/file "bigfile")) (seq-of ByteBuffer)))
;; => 86093
user> (count (convert (io/input-stream (io/file "bigfile")) (seq-of ByteBuffer)))
;; => 27063

As the above snippet shows, the count of the seq returned by convert keeps changing. We suspect it might be cause by the finalize in closeable-seq. We think what's happening is that the head of the seq got GCed when it's halfway through consuming it, which triggered a finalize call that caused the source stream to be closed prematurely.

I'm wondering if closeable-seq should stop overriding finalize and leave it to the user to close the source, like line-seq does.

ztellman commented 5 years ago

That's a serious oversight on my part, thanks for catching this. I think the only two potential solutions are to create a reference counting scheme such that it's only closed once all instances have been finalized, or to just not do any of the finalization logic. The latter is kind of a breaking change, but I'm guessing the "close on exhaustion" logic is what most (all?) people are relying on.

jerrypnz commented 5 years ago

Not sure how other people are using it but I'd always have something like with-open at a higher level that closes the source. Maybe introduce a :close? option to convert which decides if it automatically closes the source?

KingMob commented 2 years ago

Can confirm closeable-seq's finalize is definitely the problem, because disabling it fixes the error.

While I like the idea of leaving it up to users to close resources, that still leaves conversions like [File (seq-of ByteBuffer)], which will fail to close their own resources.