bkaradzic / go-lz4

Port of LZ4 lossless compression algorithm to Go
BSD 2-Clause "Simplified" License
212 stars 23 forks source link

Optimize decoder #8

Closed dgryski closed 10 years ago

dgryski commented 10 years ago

By preallocating our destination buffer, we can eliminate all the calls to append(). This plus some inlining two hot routines give a considerable speedup to Decode().

Below are benchmarks ported from snappy-go:

benchmark                  old ns/op    new ns/op    delta
BenchmarkLZ4Decode           4480128      3150442  -29.68%
BenchmarkWordsDecode1e3         6071         3506  -42.25%
BenchmarkWordsDecode1e4        69195        45798  -33.81%
BenchmarkWordsDecode1e5       744347       539174  -27.56%
BenchmarkWordsDecode1e6      6616125      4841891  -26.82%

benchmark                   old MB/s     new MB/s  speedup
BenchmarkWordsDecode1e3       164.71       285.18    1.73x
BenchmarkWordsDecode1e4       144.52       218.35    1.51x
BenchmarkWordsDecode1e5       134.35       185.47    1.38x
BenchmarkWordsDecode1e6       151.15       206.53    1.37x
bkaradzic commented 10 years ago

@dgryski Good stuff, thanks! I added you to collaborator for this project. :)

dgryski commented 10 years ago

thanks!