Closed GoogleCodeExporter closed 8 years ago
Original comment by yann.col...@gmail.com
on 12 Nov 2013 at 5:35
Thanks for the clear description.
I'm interested in the sample file if it's possible.
Could you make it available through a dropbox or other file/distribution
service ?
Thanks
Original comment by yann.col...@gmail.com
on 13 Nov 2013 at 7:27
Hi Yann,
I think I have copied the data up to a google drive. Give the links below a
try and see if it works. I also included a test program that I wrote to
test it after I ran into issues on our hardware.
We are doing all buffer to buffer operations using all 15MB buffers for both
the source and destination buffers.
Just link the program with your code.
Please let me know if you are able to get the files OK.
Thanks,
Scott
bad_data -
https://drive.google.com/file/d/0B1lUjQzCavpmdjVDdzRmVHpiQU0/edit?usp=sharin
g
test.c -
https://drive.google.com/file/d/0B1lUjQzCavpmcnFCejlRQ1RtQjA/edit?usp=sharin
g
Original comment by lsharv...@gmail.com
on 14 Nov 2013 at 12:55
Thanks, I've received the docs.
I'll get a look into them as soon as my planning allows.
Original comment by yann.col...@gmail.com
on 14 Nov 2013 at 9:47
I had some time to review this issue today.
The test program is well written, and points directly at the problem.
The proposed solution is correct too.
I thoroughly checked its impact on speed, and it appears to be negligible.
So this fix will be part of the next release.
Best Regards
Original comment by yann.col...@gmail.com
on 1 Dec 2013 at 2:07
Corrected into r109
Original comment by yann.col...@gmail.com
on 3 Dec 2013 at 3:54
Looking at r109, I think line 506 is missing the fix?
http://code.google.com/p/lz4/source/diff?spec=svn109&r=109&format=side&path=/tru
nk/lz4.c
Should
if ((limitedOutput) && unlikely(op + (1 + LASTLITERALS) + (length>>8) > oend)) return 0; // Check output limit
be replaced with
if ((limitedOutput) && unlikely(op + (1 + LASTLITERALS) + (length/255) > oend)) return 0; // Check output limit
Original comment by jpou...@gmail.com
on 4 Jan 2014 at 5:27
Good point Adrien.
I've been pondering this choice when building r109. Here are my thoughts :
Line 509 is about length of matches.
The match length calculation using >>8 drift is correct up to 64KB,
then it drifts by one byte. If such a match exist (typically a long run of
zeroes), that means we have achieved the maximum compression ratio, at 255:1,
on the segment.
With such an impressive saving at hand, it's quite improbable to simultaneously
cross the limit of output buffer (in general, limit output size ~= input size.
To achieve such an outcome, it would be necessary to have output buffer size ~<
input size - 64KB, which I've not seen so far).
Finally, I've tested Line 509 modification using /255,
and it resulted in a loss of performance.
So here we are, there is a theoretical issues, which is fairly difficult to
produce (only forged data can realistically achieve this result; even then, it
requires access to source data to compress, an advanced understanding of
implementation buffer sizes, a specific set of condition on output buffer size
(< input size - 64KB); effectively, it requires these buffer to be quite large
to begin with, which is untypical of current LZ4 scenarios (i.e., it cannot
affect ZFS, it cannot affect Lucene/SolR, etc.)). On the other hand, preventing
this theoretical corner case costs some measurable performance for everyone.
So I felt it was necessary to have a pause before moving on this one.
Original comment by yann.col...@gmail.com
on 4 Jan 2014 at 9:16
Original issue reported on code.google.com by
lsharv...@gmail.com
on 11 Nov 2013 at 6:44