Closed dgryski closed 11 years ago
By preallocating our destination buffer, we can eliminate all the calls to append(). This plus some inlining two hot routines give a considerable speedup to Decode().
Below are benchmarks ported from snappy-go:
benchmark old ns/op new ns/op delta BenchmarkLZ4Decode 4480128 3150442 -29.68% BenchmarkWordsDecode1e3 6071 3506 -42.25% BenchmarkWordsDecode1e4 69195 45798 -33.81% BenchmarkWordsDecode1e5 744347 539174 -27.56% BenchmarkWordsDecode1e6 6616125 4841891 -26.82% benchmark old MB/s new MB/s speedup BenchmarkWordsDecode1e3 164.71 285.18 1.73x BenchmarkWordsDecode1e4 144.52 218.35 1.51x BenchmarkWordsDecode1e5 134.35 185.47 1.38x BenchmarkWordsDecode1e6 151.15 206.53 1.37x
@dgryski Good stuff, thanks! I added you to collaborator for this project. :)
thanks!
By preallocating our destination buffer, we can eliminate all the calls to append(). This plus some inlining two hot routines give a considerable speedup to Decode().
Below are benchmarks ported from snappy-go: