JuliaIO / TranscodingStreams.jl

Simple, consistent interfaces for any codec.
https://juliaio.github.io/TranscodingStreams.jl/
Other
85 stars 28 forks source link

Should `write` with an `END_TOKEN` call `finalize` on the stream to prevent memory leaks? #117

Open KristofferC opened 2 years ago

KristofferC commented 2 years ago

The desire here is to close the TranscodingStream without closing the underlying buffer. This is documented in https://juliaio.github.io/TranscodingStreams.jl/latest/examples/#Explicitly-finish-transcoding-by-writing-TOKEN_END-1 and says that you should write a TOKEN_END token to the stream. However, an issue with that is that it only flushes the stream but it doesn't finalize it which leads to memory leaks in code written like:


using CodecZlib
using TranscodingStreams

function leak()
    buf = IOBuffer()
    data = rand(10^6)
    while true
        zWriter = ZlibCompressorStream(buf)
        write(zWriter, data)
        write(zWriter, TranscodingStreams.TOKEN_END)
        flush(zWriter)
    end
end

leak()

which will indefinitely leak. Manually calling finalize on the zWriter fixes the issue but it is not clear from the documentation that this is required. There are a few possible solutions:

Alternatively, it is also possible that the code that shows the leak above is "faulty" but generally, normal Julia code shouldn't leak like this so at least a finalizer might be a good idea.

nhz2 commented 6 months ago

The stream is expected to still be writable after writing TOKEN_END. For example, https://github.com/BioJulia/FASTX.jl/blob/v2.1.4/src/fastq/writer.jl#L53 uses TOKEN_END in a flush function.

With #178 you can do:

using CodecZlib
using TranscodingStreams

function no_leak()
    buf = IOBuffer()
    data = rand(10^6)
    while true
        zWriter = ZlibCompressorStream(seekstart(buf); stop_on_end=true)
        write(zWriter, data)
        close(zWriter)
    end
end

no_leak()

Adding a finalizer is still a good idea.