Optimize decoder - Githubissues

dgryski commented 11 years ago

By preallocating our destination buffer, we can eliminate all the calls to append(). This plus some inlining two hot routines give a considerable speedup to Decode().

Below are benchmarks ported from snappy-go:

benchmark                  old ns/op    new ns/op    delta
BenchmarkLZ4Decode           4480128      3150442  -29.68%
BenchmarkWordsDecode1e3         6071         3506  -42.25%
BenchmarkWordsDecode1e4        69195        45798  -33.81%
BenchmarkWordsDecode1e5       744347       539174  -27.56%
BenchmarkWordsDecode1e6      6616125      4841891  -26.82%

benchmark                   old MB/s     new MB/s  speedup
BenchmarkWordsDecode1e3       164.71       285.18    1.73x
BenchmarkWordsDecode1e4       144.52       218.35    1.51x
BenchmarkWordsDecode1e5       134.35       185.47    1.38x
BenchmarkWordsDecode1e6       151.15       206.53    1.37x

bkaradzic commented 11 years ago

@dgryski Good stuff, thanks! I added you to collaborator for this project. :)

dgryski commented 11 years ago

thanks!

bkaradzic / go-lz4

Optimize decoder #8