BioJulia / Libz.jl

Fast, flexible zlib bindings.
Other
27 stars 17 forks source link

Buffered data in deflated output stream #7

Closed juliangehring closed 8 years ago

juliangehring commented 8 years ago

Perhaps it's me misunderstanding how to use the Libz correctly, but there may be an issue with writing all buffered data to an output stream. When I follow the example from the readme

r = rand(UInt8, 1000)

using Libz
stream1 = open("data1.txt.gz", "w") |> ZlibDeflateOutputStream
for c in r
    write(stream1, c)
end
close(stream1)

this results in an empty file on disk. The compressed data is only written when the julia session is exited. Larger data sizes that exceed the buffer size are written to disk, except the last chunk - this is again written once the session is closed. I would have expected that any data in the buffer is written once close is explicitly called. This seems to be specific for Libz, as GZip writes out the compressed data directly:

using GZip
stream2 = gzopen("data2.txt.gz", "w")
for c in r
    write(stream2, c)
end
close(stream2)

I can reproduce this with julia 0.4.0, Libz master/0.0.2 and zlib versions 1.2.3 (centos 6.5)/1.2.8(ubuntu 14.04). If I can help with more details, please let me know.

dcjones commented 8 years ago

This is not unexpected, but may be worth revisiting or at least clarifying. The issue is that close(stream1) closes the zlib stream, but not the file stream opened by open("data1.txt.gz", "w").

You can do this manually:

file_stream = open("data1.txt.gz", "w")
zlib_stream = ZlibDeflateOutputStream(file_stream)
...
close(zlib_stream) # flushes zlib_stream`s buffer to file_stream
close(file_stream) # flushes file_stream's buffer to disk.

Often you want to close both, but not always (some file formats have multiple zlib blocks or data written before and after the zlib data), so I want to retain the ability to close only the zlib stream.

Tetralux commented 8 years ago

@julian-gehring +1 I agree that if you call close on the enclosing stream, it should mean that all the data buffered in the zlib stream should be written and the zlib stream closed.

More generally though, if you call close on the enclosing stream, I would expect that to close the stream it contains.

juliangehring commented 8 years ago

Thanks @dcjones for the detailed explanation, and @samoconnor for the updated example in the readme (#12).