Klozz / lz4

Automatically exported from code.google.com/p/lz4
0 stars 0 forks source link

LZ4_compress64k_stack does not handle pointer values small values correctly #77

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
We have an ARM Linux machine with a very small program that uses LZ4. We found 
that sometimes the LZ4_compress function does not correctly compress the data. 
After quite a bit of investigation, we found that the value of the data 
buffer's pointer is the cause.

The program has a data structure that sits in the low address range (less than 
64k). 

The fault was traced to the end of the do/while loop for "Find Match" 
http://code.google.com/p/lz4/source/browse/trunk/lz4_encoder.h#144 

 // Find a match
 do {
 ...
 } while ((ref < ip - MAX_DISTANCE) || (A32(ref) != A32(ip)));

In our test case, ip = 0x0000e465 and ref = 0x0000e464.
The (ip - MAX_DISTANCE) expression now results in 0xffffe466, which results in 
the comparison being a 1, causing the do/while to only quit on the  (forwardIp 
> mflimit) check in the middle of the loop, giving wrong results.

I solved it by changing the comparison to be (ref + MAX_DISTANCE < ip). 

Here follow a small test program that triggered the issue.

-----------------------------------------------------------
extern void printf_hexdump(const char *data, int size);

static const char data_buffer[] = { 0x38, 0x25, 0xFF, 0xFF, ... };
static char blob_buffer[1024];

int main(int argc, char *argv[])
{
  int blob_size;
  int data_size = sizeof(data_buffer); 

  printf("data_size   = %d\n", data_size);                /* 638 bytes */
  printf("data_buffer = 0x%04x\n", (unsigned)data_buffer); /* 0xE464 */

  /* if data_buffer ptr is less than 0xffc4, things start to go wrong */

  blob_size = LZ4_compress(data_buffer, blob_buffer, data_size);

  printf("blob_size   = %d\n", blob_size); 
  printf_hexdump(blob_buffer, blob_size); /* hex dump of compress data */
}

Original issue reported on code.google.com by fgret...@gmail.com on 7 Aug 2013 at 7:46

GoogleCodeExporter commented 8 years ago
Hello François

Yes, I'm aware of this limitation, which was first underlined by Ludwig 
Strigeus.

The issue was flagged as "cannot happen", since the way memory is allocated on 
"common" systems (such Linux, BSD, Illumos, Windows, HPUX, etc) does not allow 
an allocation range at such a low memory values.

But apparently this hypothesis is no longer valid, since you are today 
triggering the issue.

The correction you propose is correct.
The issue with this correction is that it decreases speed for all other 
systems, which makes it undesirable for PC/Workstation/SmartPhone.
So I guess it should be kept as a specific setting for embedded systems 
(selectable with a #define).

May I ask you in which use case you are able to produce this scenario ?

Regards

Original comment by yann.col...@gmail.com on 7 Aug 2013 at 5:29

GoogleCodeExporter commented 8 years ago

Original comment by yann.col...@gmail.com on 7 Aug 2013 at 6:38

GoogleCodeExporter commented 8 years ago
OK, I made a few more tests, to properly assess the performance impact.

It appears it is much lower than anticipated. The performance cost is within 
0.5% and 1%, which is small enough to be almost negligible. This is much better 
than what I remember, maybe something changed since the test was done last 
time, making the current version more favorable.

You'll find in attached file a development version implementing your 
proposition.
Note that you took me in the middle of a big refactoring, so you may be a bit 
lost when searching into the code. The current objective is to change all the 
"macro" functions by "inline" derivatives. You'll find your modification inside 
"lz4_inline.h". 

The refactoring from macro to inline is functional, but not yet fully 
finalized, so expect a bit of time before the next release (r101).

Regards

Original comment by yann.col...@gmail.com on 7 Aug 2013 at 7:21

Attachments:

GoogleCodeExporter commented 8 years ago
Hallo Yann,

Yes, I agree that on "normal" systems it should not happen.

We have an embedded board with a TI C6000 DSP and ARM core. Both run LZ4.  We 
first had an issue with the DSP compiler not correctly compiling the code and 
cause incorrect compressed results. The cause was found the be the GCC 
extensions. The TI DSP compiler by default do not enable GCC extensions, but 
gave no warning of incorrect behavior. After the compiler switch was found and 
enabled, the compression worked as expected.

During this exercise I wrote a small test program similar to the test case one 
show above to demonstrate and isolate the issue. Later I reused the same test 
program on the ARM since are were also seeing incorrect compression results 
there as well. 

That is when I started to notice the pointer issue. It was unexpected. Once I 
found the cause, I also determined that it should not really be a cause of our 
other issues since there we also use allocated blocks which would use higher 
memory. 

Ideally, I don't mind the optimization as you implemented it, but there should 
be an check and/or warning somewhere to warn that the pointer value is to low. 
Even if it is just an assert that is only active in debug mode. Optimization is 
fine, as long as one knows where the limitations are.

Cheers
  Francois

Original comment by fgret...@gmail.com on 7 Aug 2013 at 7:47

GoogleCodeExporter commented 8 years ago
Integrated into r101. Note that there was a second line of source code to 
modify, using the same solution.

Original comment by yann.col...@gmail.com on 12 Aug 2013 at 8:44