airlift / aircompressor

A port of Snappy, LZO, LZ4, and Zstandard to Java
Apache License 2.0
566 stars 112 forks source link

LZO not compatible with Hadoop LZO #50

Closed dain closed 8 years ago

dain commented 8 years ago

Comment from @nezihyigitbasi

Stepping through hadoop lzo impl. (which implements LZO1X) I noticed something different than aircompressor lzo (I don't really know whether aircompressor implements the same algorihtm). Hadoop lzo at the beginning reads two integers from the input stream (4 bytes for original block size + 4 bytes for compressed chunk length) then the rest of the stream is interpreted as compressed data, and it succeeds. I did the same and consumed 8 bytes before passing the data to aircompressor's lzo decompressor, then most of the test cases passed (there were still failures).

Anyway here is the minimal code that shows how airlift lzo decompressor fails while hadoop's lzo decomp. succeeds with the same input.

dain commented 8 years ago

The example code is incorrect. It uses the Hadoop wrapper on the LZO block compressor. The Hadoop wrapper adds additional framing to the compressed blocks. In Aircompressor, the equivalent class is HadoopLzoInputStream and using this code instead works:

        HadoopLzoInputStream hadoopLzoInputStream = new HadoopLzoInputStream(new ByteArrayInputStream(compressed), 1000);
        byte[] streamOutput = toByteArray(hadoopLzoInputStream);
        System.out.println(Arrays.toString(streamOutput));