JuliaIO / GZip.jl

A Julia interface for gzip functions in zlib
https://juliaio.github.io/GZip.jl/dev
MIT License
39 stars 30 forks source link

GZip.IOBuffer type? #23

Open quinnj opened 9 years ago

quinnj commented 9 years ago

So I'm not super familiar with how the package is currently structured and am meaning to dive in more, but I wonder if someone could help clarify if this is a good idea or not.

I'm thinking of creating a GZip.IOBuffer type that would be a wrapper around a IOBuffer type, but that you could write to and whatever you write gets gzipped into the buffer. You could then do a takebuf_string to get the raw gzip data and send in an HTTP request, for example.

Is this amenable to how GZip compression works? Gzipping chunks at a time like this? I think the approach should be pretty simple to implement, but if there's anything I should watch out for or avoid, I'd love to hear it.

kmsquire commented 9 years ago

Hi Jacob, you might consider using Zlib.jl instead, as it already has a stream API, and can compress (and I believe uncompress) gzip-formatted files/streams. It might require a little bit of hacking to clean it up and add functionality--e.g., the stream Reader has a parameter to set the decompress buffer size, but the Writer doesn't seem to (or at least it's not documented).

quinnj commented 9 years ago

Thanks @kmsquire, that does seem pretty close to what I'm looking for. Do you know the reason/history for both packages? Should we try to merge them at some point while improving the APIs?

garborg commented 9 years ago

cc. @dcjones

kmsquire commented 9 years ago

I believe that Daniel originally wrote Zlib.jl as something which could quickly and easily compress or decompress a buffer. It gained additional functionality over time.

I wrote GZip.jl because I needed access to gzipped files, and Zlib.jl didn't exist yet, or I felt it didn't quite fit my needs (I think the original implementation had a really small, fixed sized buffer and was pretty slow), and I thought that simply calling the zlib C functions would be faster/more efficient.

I actually do think that the packages could be combined.

At one point, I had hopes that the magic number of gzip (and other compressed files) would automatically be recognized, and the files would be uncompressed automatically (with a raw mode available for reading the compressed data), but my need for something like that diminished, and I've never gotten back to it. Might be good to make this part of https://github.com/JuliaIO.

Cc: @simondanisch

quinnj commented 9 years ago

I wasn't aware of JuliaIO; when did that start?

It might be good to integrate some of the Zlib.jl stuff into this one, since this is already owned by JuliaLang. Basic compression stuff seems like not a bad canddiate for being closely related to Base or "blessed" by JuliaLang.

kmsquire commented 9 years ago

JuliaIO is only a couple of weeks old. ;-) @simondanisch started it. I don't know how well defined the goals are yet, but I think it's generally a good idea to get most IO in the same area. But it's only useful if there are enough interested parties involved. Interested?

Regarding integrating Zlib.jl into this: that would be fine, but I think the naming would probably be backwards, since the Zlib functionality arguably is a superset of GZip's functionality. Maybe @dcjones would be interested in transferring Zlib.jl to JuliaLang?

dcjones commented 9 years ago

Maybe @dcjones would be interested in transferring Zlib.jl to JuliaLang?

Sure, and I'm all for merging the two.

SimonDanisch commented 9 years ago

I created JuliaIO with one simple goal: being able to get the correct Julia Object from an arbitrary path.

read("some/path/image.jpg") #-> Image
read("some/path/archive.rar") #->  compressed stream!?

So it's not really necessary to move every package to JuliaIO. Important is that they all implement the same interface. Than in JuliaIO there could be a meta package Compression.jl, which organizes different lower level compression libraries and basically passes them to FileIO. But having all IO packages in one group with motivated people that have push access will definitely help achieving this;)

dcjones commented 9 years ago

For those interested I wrote another set of zlib bindings that's generally faster than both Zlib.jl and GZip.jl (and @jiahao's code in #32) and mostly encompasses the features of both: Libz.jl.

The buffering used to make it fast is split into another package BufferedStreams.jl, so it can be reused if we want fast bindings to other compression libraries, etc. Actually, it even seems to make writing/reading plain data to/from IOStreams faster in some cases.

I hope we can standardize on some version of this, but let me know if there are critical missing features, or other stuff I'm missing.

kmsquire commented 9 years ago

@dcjones, great! I'll try to take a look this weekend.