snappy's tester cheats a little in snappy's favor

GoogleCodeExporter commented 9 years ago

I ported the code to pure C as part of an effort to use it in the Linux kernel. 
When I tested the code as a regular user space dynamic library against snappy's 
tester, I was surprised to find out it was slower, although the inner loop was 
exactly the same machine code. In the tester, I found that for every contender, 
the output buffers are resized prior to compress/uncompress calls. Except for 
snappy. The change accounted for the differences in compression performance.

I ask you to apply the attached patch that fixes this problem. Attached also 
are logs of benchmark runs with and without the patch. Benchmarks were run on a 
desktop pc with X shutdown and cpu clock manually set to not change. Core 2 Duo 
2.33Ghz, Gentoo ~amd64, Glibc 2.13, GCC 4.6.0, just -O2.

With the code as is:

testdata/house.jpg                       :
LZO:    [b 1M] bytes 126958 -> 127173 100.2%  comp  23.9 MB/s  uncomp 1663.8 
MB/s
SNAPPY: [b 4M] bytes 126958 -> 126803 99.9%  comp 2409.3 MB/s  uncomp 8523.6 
MB/s
testdata/html                            :
LZO:    [b 1M] bytes 102400 ->  21027 20.5%  comp 135.2 MB/s  uncomp 494.6 MB/s
SNAPPY: [b 4M] bytes 102400 ->  24140 23.6%  comp 419.1 MB/s  uncomp 850.1 MB/s
testdata/kennedy.xls                     :
LZO:    [b 1M] bytes 1029744 -> 357315 34.7%  comp 157.8 MB/s  uncomp 622.9 MB/s
SNAPPY: [b 4M] bytes 1029744 -> 425735 41.3%  comp 351.8 MB/s  uncomp 512.4 MB/s
testdata/mapreduce-osdi-1.pdf            :
LZO:    [b 1M] bytes  94330 ->  76999 81.6%  comp  29.5 MB/s  uncomp 946.8 MB/s
SNAPPY: [b 4M] bytes  94330 ->  77477 82.1%  comp 837.6 MB/s  uncomp 1969.4 MB/s

With patch applied:

testdata/house.jpg                       :
LZO:    [b 1M] bytes 126958 -> 127173 100.2%  comp  24.0 MB/s  uncomp 1661.1 
MB/s
SNAPPY: [b 4M] bytes 126958 -> 126803 99.9%  comp 2312.3 MB/s  uncomp 8502.0 
MB/s
testdata/html                            :
LZO:    [b 1M] bytes 102400 ->  21027 20.5%  comp 135.7 MB/s  uncomp 494.0 MB/s
SNAPPY: [b 4M] bytes 102400 ->  24140 23.6%  comp 404.6 MB/s  uncomp 851.0 MB/s
testdata/kennedy.xls                     :
LZO:    [b 1M] bytes 1029744 -> 357315 34.7%  comp 157.6 MB/s  uncomp 622.9 MB/s
SNAPPY: [b 4M] bytes 1029744 -> 425735 41.3%  comp 343.4 MB/s  uncomp 513.0 MB/s
testdata/mapreduce-osdi-1.pdf            :
LZO:    [b 1M] bytes  94330 ->  76999 81.6%  comp  29.5 MB/s  uncomp 954.0 MB/s
SNAPPY: [b 4M] bytes  94330 ->  77477 82.1%  comp 812.0 MB/s  uncomp 1971.7 MB/s

Original issue reported on code.google.com by zeev.tar...@gmail.com on 29 Mar 2011 at 10:58

Attachments:

GoogleCodeExporter commented 9 years ago

Hi,

I looked a bit into this, and you're right that it's not equal between the 
different compression algorithms. This is most likely because Compress() is 
used both for the comparative benchmarks and for the ones within Snappy itself; 
and for the latter, we don't want the performance numbers to be polluted with 
whatever malloc is in use. (I'm actually surprised it matters so much, but I'd 
guess it depends on your malloc.)

I think the right solution is actually the opposite of what you're proposing, 
though; remove the resize() calls from the other algorithms, not add them to 
Snappy. This would require a bit more logic on the outside, though (it would 
need to resize the buffer to something that gives sense for all the algorithms 
in use).

Original comment by se...@google.com on 29 Mar 2011 at 10:27

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

I don't particularly care which way the tester makes sure libraries have equal 
conditions, just that it does.

There is code to that effect:
      // Pre-grow the output buffer so we don't measure stupid string
      // append time
      compressed[b].resize(block_size * 2);

So the buffer should be large enough to satisfy any sane algorithm (and indeed 
LZO works with the resize commands removed). I don't have the other libraries 
besides zlib and lzo to test with, and zlib decompression for example doesn't 
work with just the resize command removed, so I hesitate to touch code I can't 
even test.

Original comment by zeev.tar...@gmail.com on 30 Mar 2011 at 3:00

GoogleCodeExporter commented 9 years ago

QuickLZ needs 36000 scratch bytes, though, so perhaps max(block_size * 2, 
block_size + 36000)?

Why can't you test the other libraries? They are all freely available.

Original comment by se...@google.com on 30 Mar 2011 at 11:01

GoogleCodeExporter commented 9 years ago

Original comment by se...@google.com on 30 Mar 2011 at 12:45

Changed state: Started

GoogleCodeExporter commented 9 years ago

Fixed in r24.

Original comment by se...@google.com on 30 Mar 2011 at 8:28

Changed state: Fixed

blowhacker / snappy

snappy's tester cheats a little in snappy's favor #26