Closed GoogleCodeExporter closed 8 years ago
I'll handle this one.
Original comment by yann.col...@gmail.com
on 22 Sep 2011 at 7:55
Got it. That's a very special case, but nonetheless :
there is less than 4 bytes left into Input buffer.
So reading *(U32*)ip fails.
This may only happen when a compressed segment ends with a strict small match.
The input buffer must also be sized exactly the size of input stream, with no
extra byte.
Original comment by yann.col...@gmail.com
on 22 Sep 2011 at 8:46
Issue is corrected in r20.
Thanks for the detailed bug report, which made this issue clear to understand
and to solve.
Original comment by yann.col...@gmail.com
on 22 Sep 2011 at 1:14
r26 maybe? :)
You are welcome!
Original comment by fmot.f...@gmail.com
on 22 Sep 2011 at 1:25
OK, so here is a description of the problem and of the solution applied :
The problem happens during decompression, if input buffer (containing the
compressed stream) is "strict", which means there is not a single byte more
allocated than the size of the compressed stream, and reading beyond the input
buffer will result in a segfault.
If such a compressed stream ends with a short match, itself preceded by zero
literal, that means the last sequence looks like this :
1 Byte : Token (that's where is "ip")
2 Bytes : Offset
EOF
The decoder reads the token, and discover there is no literal.
However, the decoder pre-emptively tries to copy the next 4 bytes from input to
the output. Although this will be corrected later on, it crashes before that
point, since there is only 2 bytes to read.
A manual "DIY" way to solve the issue is to allocate 2 bytes more in input
buffer than the size of compressed stream. That's however may not be always
possible.
This issue would not happen if the match was followed by at least one more
literal, since the sequence would look like this :
1 Byte : Token (that's where is "ip")
2 Bytes : Offset
1 Byte : Token
1 Byte : Literal
EOF
Here, there is at least 4 bytes to read beyond the first token. Therefore,
there is no "read" error.
That's the solution used currently. The version r26 of LZ4 ensures that it
always ends with a literal.
This "parsing trick" makes the decoder work. However, the decoder has not been
modified, therefore streams compressed with an earlier version of LZ4, or with
another compressor generating an LZ4 compatible format, may still segfault
during decoding if above conditions are met.
Original comment by yann.col...@gmail.com
on 22 Sep 2011 at 1:29
Original issue reported on code.google.com by
fmot.f...@gmail.com
on 21 Sep 2011 at 2:24