appc / spec

App Container Specification and Tooling (archived, see https://github.com/rkt/rkt/issues/4024)
Apache License 2.0
1.26k stars 146 forks source link

wanted: xz and bzip2 compressors in Go #14

Open philips opened 9 years ago

philips commented 9 years ago

From @philips on December 2, 2014 0:0

Currently ACI's can be compressed with xz or bzip2 but actool build can only do gzip compression because Go libraries don't exist for xz or bzip2.

This feature would require implementing these specs in pure-go. Shelling out or linking to a C library won't fix this bug.

Copied from original issue: coreos/rocket#143

philips commented 9 years ago

From @mrshu on December 2, 2014 7:4

What about http://golang.org/pkg/compress/bzip2/ ? I might be horribly wrong but it seems to be a pure go implementation.

philips commented 9 years ago

From @jonboulle on December 2, 2014 7:7

@mrshu unfortunately that only supports decompression, not compression.

philips commented 9 years ago

From @hausdorff on December 2, 2014 9:6

The primary reason bzip2 compression was never implemented in Go is because there exists no RFC for it, which means reimplementation is primarily looking at reference implementation, guessing, and checking the output on a variety of input. This is necessarily easier for decompression because it's obvious if your output was correct, where for compression, it is more subtle.

I might be able to give this a shot though. What is the timeframe you're looking at for a patch?

philips commented 9 years ago

From @kelseyhightower on December 2, 2014 10:28

@hausdorff No hard time limit. If you would like to give this a shot let us know.

philips commented 9 years ago

From @rektide on December 2, 2014 20:55

I'd take lz4 over all the above- way lower CPU usage for excellent compression. The Cassandra datastore for example made the switch from the excellent Snappy to LZ4 recently.

philips commented 9 years ago

@rektide If there are Go libraries (so our build system remains simple) I don't have a strong opinion.

philips commented 9 years ago

From @rektide on December 2, 2014 21:9

@philips the link i provided is to the Go implementation. :)

philips commented 9 years ago

From @hausdorff on December 3, 2014 0:17

@philips Sorry, I'm confused. You're not looking specifically for a go implementation of bzip2 and xz?

philips commented 9 years ago

From @jonboulle on December 3, 2014 0:19

@hausdorff we are looking for pure Go implementations; what philips meant was that we are open to considering alternative compression formats (like LZ4) instead of just gz/bz2/xz

philips commented 9 years ago

Can someone run some benchmarks around lz4? Random guy on the internet doesn't have a very favorable report on lz4 performance:

http://pokecraft.first-world.info/wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO

philips commented 9 years ago

From @mboersma on December 3, 2014 0:47

bzip2 compression in the go standard library is still a TODO: https://code.google.com/p/go/issues/detail?id=4828

philips commented 9 years ago

From @c4milo on December 5, 2014 23:37

I started working on a bzip2 compressor for Go a while back, wrote a hackpad about it but didn't have time to finish implementing. Here is the hackpad with resources and an algorithm overview for whoever wants to carry on: https://hackpad.com/BZIP2-compressor-for-Go-HAunKJL4GnH

philips commented 9 years ago

From @bmatsuo on December 6, 2014 2:26

@philips it should be noted that the lz4 package linked to previously (github.com/bkaradzic/go-lz4) does not support framed/streaming compression (i.e. there are no Writer/Reader types for working with large streams). In order to (de)compress a block all data must be in memory (in a []byte).

AFAICS the needs here call for an implementation of the framed compression format, of which I never found a pure Go implementation. I started working on an implementation a few months back. I never finished it because I got confused and felt I missed some details from the spec. If someone knows more about lz4 and wanted to help I'd be willing to post my unfinished code somewhere for some collaboration. Ping me.

All that said I do like Snappy. It's a simple spec and it's quite fast. It doesn't seem to be as good as lz4. But I have used github.com/mreiferson/go-snappystream and my own fork github.com/bmatsuo/snappyframed for streaming compression with good success.

philips commented 9 years ago

From @hausdorff on December 6, 2014 2:28

Peripheral discussion of other compression algorithms aside, I'm going to implement bzip2.

If anyone thinks this is not a good use of time, you better say something now. :)

philips commented 9 years ago

From @jonboulle on December 7, 2014 2:49

@hausdorff Seems to be it would be a good use of time for the Go community at large - ideally you could get it upstream, too.

philips commented 9 years ago

From @bmatsuo on December 8, 2014 3:38

@jonboulle @hausdorff +1

I would encourage anyone willing to work on it to do so, regardless of rocket. The goal should be eventually landing it in the stdlib.

But it is slow. It will always be slow. Compression is about tradeoffs and bzip2 is pretty far on one end of the spectrum (optimizing size, not speed). Xz appears to hang out over there as well, though I know very little about it.

Strictly within AWS I found that uploading/downloading large bzip2 files to/from S3 on EC2 was not computationally worth it for one application I work on (which does quite a lot of uploading and downloading). (De)Compression is a huge portion of the whole process that can easily max out CPUs and slow things down more.

hausdorff commented 9 years ago

Sorry for the delays folks, Christmas season is a bit busy for free time projects. :)

I've just brought up the issue on the golang list, so I expect the patch to start reviewing in a few weeks.

jonboulle commented 9 years ago

sounds good!

jonboulle commented 9 years ago

455