LZO not compatible with Hadoop LZO

airlift / aircompressor

A port of Snappy, LZO, LZ4, and Zstandard to Java

Apache License 2.0

566 stars 112 forks source link

Comment from @nezihyigitbasi

Stepping through hadoop lzo impl. (which implements LZO1X) I noticed something different than aircompressor lzo (I don't really know whether aircompressor implements the same algorihtm). Hadoop lzo at the beginning reads two integers from the input stream (4 bytes for original block size + 4 bytes for compressed chunk length) then the rest of the stream is interpreted as compressed data, and it succeeds. I did the same and consumed 8 bytes before passing the data to aircompressor's lzo decompressor, then most of the test cases passed (there were still failures).

Anyway here is the minimal code that shows how airlift lzo decompressor fails while hadoop's lzo decomp. succeeds with the same input.

HadoopLzoInputStream hadoopLzoInputStream = new HadoopLzoInputStream(new ByteArrayInputStream(compressed), 1000); byte[] streamOutput = toByteArray(hadoopLzoInputStream); System.out.println(Arrays.toString(streamOutput));

airlift / aircompressor

LZO not compatible with Hadoop LZO #50