airlift / aircompressor

A port of Snappy, LZO, LZ4, and Zstandard to Java
Apache License 2.0
549 stars 110 forks source link

Compression ratio is different in ZSTD algorithm between ZstdOutputStream and ZstdCompressor.compress(Bytebuffer) #187

Closed believezzd closed 4 months ago

believezzd commented 4 months ago

Description

Aircompressior Version

<dependency>
      <groupId>io.airlift</groupId>
      <artifactId>aircompressor</artifactId>
      <version>0.26</version>
</dependency>

Code

ZstdCompressor.compress(Bytebuffer)

public static long compressFile(String inFileName, String outFileName) throws IOException {
    File inFile = new File(inFileName);
    File outFile = new File(outFileName);

    long numBytes = 0L;

    ByteBuffer inBuffer = ByteBuffer.allocateDirect(8*1024*1024); 
    ByteBuffer outBuffer = ByteBuffer.allocateDirect(8*1024*1024);
    try (RandomAccessFile inRaFile = new RandomAccessFile(inFile, "r"); 
        RandomAccessFile outRaFile = new RandomAccessFile(outFile, "rw");
        FileChannel inChannel = inRaFile.getChannel();
        FileChannel outChannel = outRaFile.getChannel()) {

        ZstdCompressor compressor = new ZstdCompressor();
        inBuffer.clear();
        while(inChannel.read(inBuffer) > 0) {
            inBuffer.flip();
            outBuffer.clear();

            compressor.compress(inBuffer, outBuffer);

            outBuffer.flip();
            outChannel.write(outBuffer);
            inBuffer.clear();
        }
    }

    return numBytes;
}

ZstdOutputStream

public static long compressFile(String inFileName, String outFileName) throws IOException {
    File inFile = new File(inFileName);
    File outFile = new File(outFileName);

    long numBytes = 0L;
    byte[] buffer = new byte[1024 * 1024 * 8];

    FileInputStream fi = null;
    FileOutputStream fo = null;

    try {
        fi = new FileInputStream(inFile);
        fo = new FileOutputStream(outFile);

        try (ZstdOutputStream zs = new ZstdOutputStream(fo)) {
            while (true) {
                int compressedSize = fi.read(buffer, 0, buffer.length);
                if (compressedSize == -1) {
                    break;
                }

                zs.write(buffer, 0, compressedSize);

                numBytes += compressedSize;
            }
        }
    } catch (Exception ex) {
        log.error("Error: ", ex);
    } finally {
        IOUtils.closeQuietly(fi);
        IOUtils.closeQuietly(fo);
    }

    return numBytes;
}

File to Compress

Computer

JDK

believezzd commented 4 months ago

@martint

Could you give me a help.

dain commented 4 months ago

The answer is they are very different compression techniques. ZstdCompressor is a block compressor which means it compresses a block of data in memory to an output buffer in memory in one shot. The requires the full input and output buffers to fit into memory. ZstdOutputStream is a stream compressor, which chops the imput data into chunks and uses the block compressor to compress the chunk. This means only part of the data needs to fit into memory at a time, but doesn't compress quite as well (it also adds extra data to the outptu describing the framing and such). BTW, what I am describing works for basically every compression algorithm.