flanglet / kanzi-cpp

Fast lossless data compression in C++
Apache License 2.0
137 stars 3 forks source link

An error occurred using TPAQ and TPAQX #5

Closed luozui closed 2 years ago

luozui commented 2 years ago
#include <fstream>
#include <iostream>
#include "types.hpp"
#include "InputStream.hpp"
#include "OutputStream.hpp"
#include "io/CompressedInputStream.hpp"
#include "io/CompressedOutputStream.hpp"

using namespace kanzi;
using namespace std;

uint64 testCompress(byte block[], uint length, string name = "FPAQ") {
    // Create an OutputStream
    OutputStream* os = new ofstream("compressed.knz", ofstream::out | ofstream::binary);

    // Create a CompressedOutputStream
    CompressedOutputStream cos(*os, name, "NONE", 4194304, false, 1);

    // Compress block
    cos.write((const char*)block, length);

    // Close CompressedOutputStream
    cos.close();

    // Get number of bytes written
    uint64 written = cos.getWritten();
    delete os;
    return written;
}

uint64 testDecompress(byte block[], uint length) {
    // Create an InputStream
    InputStream* is = new ifstream("compressed.knz", ifstream::in | ifstream::binary);

    // Create a CompressedInputStream
    CompressedInputStream cis(*is, 1);

    // Decompress block
    cis.read((char*)block, length);

    // Close CompressedInputStream
    cis.close();

    // Get number of bytes read
    uint64 read = cis.getRead();
    delete is;
    return cis.gcount();
    // return read;
}

int myhash(char* buf, int n) {
    unsigned long long res = 0, mod = 1000000007;
    for (int i = 0; i < n; ++i) {
        res = buf[i] + res * mod;
    }
    return res;
}

void test_(string name) {
    int n = 15 * 1024 * 1024;
    byte* in = new byte[n];
    for (int i = 0; i < n; ++i) in[i] = i & 255;
    testCompress(in, n, name);
    byte* out = new byte[n];
    testDecompress(out, n);
    int h_in = myhash((char *)in, n);
    int h_out = myhash((char *)out, n);
    fprintf(stderr, "%10s: %X, %X, %s\n", name.c_str(), h_in, h_out, h_in == h_out ? "true" : "false");
}

void test() {
    test_("None");
    test_("Huffman");
    test_("ANS0");
    test_("ANS1");
    test_("Range");
    test_("FPAQ");
    test_("TPAQ");
    test_("TPAQX");
    test_("CM");
}

Hi Frederic! I used similar code to use this library, but an error occurred using TPAQ and TPAQX, then I wrote this test program and got this result

      None: D1780000, D1780000, true
   Huffman: D1780000, D1780000, true
      ANS0: D1780000, D1780000, true
      ANS1: D1780000, D1780000, true
     Range: D1780000, D1780000, true
      FPAQ: D1780000, D1780000, true
      TPAQ: D1780000, 53447A22, false
     TPAQX: D1780000, 82DF5A3, false
        CM: D1780000, D1780000, true

I don't know if there is something wrong with my parameter settings that caused this to happen.

But when I run the compiled program (./kanzi) with the following parameters to compress my data, I get the correct result

./kanzi -v 5 -t NONE -e TPAQX -j 1 -c -i in -o out
./kanzi -v 5 -t NONE -e TPAQX -j 1 -d -i out -o out.bak
flanglet commented 2 years ago

I can confirm the observed behavior. Thanks for reporting this. I will take a look at it shortly.

flanglet commented 2 years ago

Can you try again ? It should be fixed.

luozui commented 2 years ago

Now it's working correctly. Awesome!