Closed quinnj closed 8 years ago
Hmm interesting use case. I think something like this should do the trick.
using Libz, BufferedStreams
input = ZlibInflateInputStream(open(filename))
output_buffer = BufferedOutputStream()
output_stream = ZlibDeflateOutputStream(output_buffer)
block_size = 100000000
bytes_read = 0
for line in eachline(input)
write(output_stream, line)
bytes_read += length(line)
if bytes_read > block_size
close(output_stream)
block = takebuf_array(output_buffer)
# TODO: do something with block
# open a new stream for the next block
output_stream = ZlibDeflateOutputStream(output_buffer)
bytes_read = 0
end
end
# flush remaining data
close(output_stream)
block = takebuf_array(output_buffer)
# TODO: do something with block
There's not a built in way to track the number of bytes written to an output stream currently, hence keeping track manually with bytes_read
. I think that could be solved by implementing position()
on BufferedOutputStream
.
Thanks BTW; this package is working great for me.
Hey @dcjones, quick question on how to do what I want to do:
readline(io)
(which inflates a single line from the CSV file), open a new gzip file for writing, and write to it, line-by-line, until I have ~90MB uncompressed, which I could detect usingposition(buf)
, withbuf
being my new gzip file. I'd then doreadall(tmp_file)
and put that in the body of my web request.Obviously my process above has some inefficiencies, particularly because nothing is in-memory or buffered. I don't think I can get around having to inflate, then deflate though so that I make sure to send the file in line-based chunks. I think I want to do something like
Any tips?