Accessing memory beyond the end of the input buffer (still reproduces)

GoogleCodeExporter commented 8 years ago

That's not the duplicate of the previous issue. I've changed the data being 
compressed and this happened again.

1. Run the stress test (Windows) http://directnet-drive.net/lz4_test.cpp (I've 
uploaded a new version of the test)
2. Program crashes
3. #define LZ4_WORKAROUND to workaround the bug

lz4 latest svn source (r25), compiled using VS2010, run on Windows 7
lz crashes on the lines
do { *(U32*)op = *(U32*)ip; op+=4; ip+=4; } while (op<ref) ;
and
ip++; forwardH = HASH_VALUE(ip);

Original issue reported on code.google.com by fmot.f...@gmail.com on 21 Sep 2011 at 2:24

GoogleCodeExporter commented 8 years ago

I'll handle this one.

Original comment by yann.col...@gmail.com on 22 Sep 2011 at 7:55

Changed state: Accepted

GoogleCodeExporter commented 8 years ago

Got it. That's a very special case, but nonetheless :
there is less than 4 bytes left into Input buffer. 
So reading *(U32*)ip fails.

This may only happen when a compressed segment ends with a strict small match.
The input buffer must also be sized exactly the size of input stream, with no 
extra byte.

Original comment by yann.col...@gmail.com on 22 Sep 2011 at 8:46

GoogleCodeExporter commented 8 years ago

Issue is corrected in r20.
Thanks for the detailed bug report, which made this issue clear to understand 
and to solve.

Original comment by yann.col...@gmail.com on 22 Sep 2011 at 1:14

Changed state: Fixed

GoogleCodeExporter commented 8 years ago

r26 maybe? :)
You are welcome!

Original comment by fmot.f...@gmail.com on 22 Sep 2011 at 1:25

GoogleCodeExporter commented 8 years ago

OK, so here is a description of the problem and of the solution applied :

The problem happens during decompression, if input buffer (containing the 
compressed stream) is "strict", which means there is not a single byte more 
allocated than the size of the compressed stream, and reading beyond the input 
buffer will result in a segfault.

If such a compressed stream ends with a short match, itself preceded by zero 
literal, that means the last sequence looks like this :

1 Byte : Token (that's where is "ip")
2 Bytes : Offset
EOF

The decoder reads the token, and discover there is no literal.
However, the decoder pre-emptively tries to copy the next 4 bytes from input to 
the output. Although this will be corrected later on, it crashes before that 
point, since there is only 2 bytes to read.

A manual "DIY" way to solve the issue is to allocate 2 bytes more in input 
buffer than the size of compressed stream. That's however may not be always 
possible.

This issue would not happen if the match was followed by at least one more 
literal, since the sequence would look like this :

1 Byte  : Token (that's where is "ip")
2 Bytes : Offset
1 Byte  : Token
1 Byte  : Literal
EOF

Here, there is at least 4 bytes to read beyond the first token. Therefore, 
there is no "read" error.

That's the solution used currently. The version r26 of LZ4 ensures that it 
always ends with a literal. 

This "parsing trick" makes the decoder work. However, the decoder has not been 
modified, therefore streams compressed with an earlier version of LZ4, or with 
another compressor generating an LZ4 compatible format, may still segfault 
during decoding if above conditions are met.

Original comment by yann.col...@gmail.com on 22 Sep 2011 at 1:29

Klozz / lz4

Accessing memory beyond the end of the input buffer (still reproduces) #3