coderforlife / ms-compress

Open source implementations of Microsoft compression algorithms
205 stars 46 forks source link

xpress-huffman unable to handle larger buffers of uncompressible data #27

Closed nemequ closed 8 years ago

nemequ commented 8 years ago

With

#include <stdlib.h>
#include <stdio.h>
#include <assert.h>

#include "include/mscomp.h"
#include "include/xpress_huff.h"

#define CODEC MSCOMP_XPRESS_HUFF

int main(int argc, char *argv[]) {
  uint8_t uncompressed[25143];
  size_t compressed_size = ms_max_compressed_size (CODEC, sizeof(uncompressed));
  uint8_t* compressed = (uint8_t*) malloc (compressed_size);

  {
    FILE* r = fopen ("/dev/urandom", "rb");
    assert (r != NULL);
    fread (uncompressed, 1, sizeof(uncompressed), r);
    fclose (r);
  }

  MSCompStatus status =
    ms_compress (CODEC,
         uncompressed, sizeof(uncompressed),
         compressed, &compressed_size);

  assert (status == MSCOMP_OK);

  free (compressed);

  return 0;
}

I get

random-test: src/xpress_huff_compress.cpp:311: MSCompStatus xpress_huff_compress(const_bytes, size_t, bytes, size_t*): Assertion `comp_len <= in_len+6' failed.

I think the problem is in the compressor, not the size calculation, since it doesn't seem to matter how much room I give it (e.g., adding 65536 bytes to compressed_size has no effect).

coderforlife commented 8 years ago

This is issue #14 - apparently never fully fixed. I will look into this in a few days.

coderforlife commented 8 years ago

This is now hopefully fixed in 60ef3852dbd49cb8504c8500033d471fc3344834.

Previously I was not taking into account the fact that for the last chunk of data the end-of-stream symbol would cause another symbol to be written as 9 bits as well, which adds up to 32 additional bytes to the data (+2 for the end-of-stream 9 bits and the 2-byte alignment).

The last thing to test just to make sure it works is a stream that has each byte 0-255 exactly 256 times each and is uncompressible. This may end up requiring it to be +33 instead of +32. I just need to come up with how to design that compression-killer...

coderforlife commented 8 years ago

Calculated it for the theoretical I described above:

floor((9*256*1 + 8*256*255 + 9*1*1 + 16 + 15) / 16) * 2 - 65536 = 36

So it actually needs to be +34 (not including the +2 for 2-byte alignment).