Closed beorn7 closed 10 years ago
@bmizerany I hope you find that interesting... Cheers.
I'm interested. Can you show the deltas in performance in this ticket?
Can you create a pool type and a pool_1_2.go and pool_1_3.go that implement it? The later would use sync.Pool in 1.3?
benchcmp output:
benchmark old ns/op new ns/op delta BenchmarkInsertTargeted 152 157 +3.29% BenchmarkInsertTargetedSmallEpsilon 368 361 -1.90% BenchmarkInsertBiased 163 158 -3.07% BenchmarkInsertBiasedSmallEpsilon 870 787 -9.54% BenchmarkQuery 85 82 -2.82% BenchmarkQuerySmallEpsilon 5967 4647 -22.12%
benchmark old allocs new allocs delta BenchmarkInsertTargeted 1 0 -100.00% BenchmarkInsertTargetedSmallEpsilon 1 0 -100.00% BenchmarkInsertBiased 1 0 -100.00% BenchmarkInsertBiasedSmallEpsilon 1 0 -100.00% BenchmarkQuery 0 0 n/a% BenchmarkQuerySmallEpsilon 0 0 n/a%
benchmark old bytes new bytes delta BenchmarkInsertTargeted 32 0 -100.00% BenchmarkInsertTargetedSmallEpsilon 32 0 -100.00% BenchmarkInsertBiased 32 0 -100.00% BenchmarkInsertBiasedSmallEpsilon 32 0 -100.00% BenchmarkQuery 0 0 n/a% BenchmarkQuerySmallEpsilon 0 0 n/a%
Will look into 1.3 specific build as soon as I find time...
Awesome. This is great!
Interestingly, the go1.3 pool is slightly slower than the channel based manually implemented pool:
benchcmp new-1.2-pool.txt new-1.3-pool.txt benchmark old ns/op new ns/op delta BenchmarkInsertTargeted 158 181 +14.56% BenchmarkInsertTargetedSmallEpsilon 366 388 +6.01% BenchmarkInsertBiased 158 183 +15.82% BenchmarkInsertBiasedSmallEpsilon 808 825 +2.10% BenchmarkQuery 82 85 +3.14% BenchmarkQuerySmallEpsilon 4541 4889 +7.66%
benchmark old allocs new allocs delta BenchmarkInsertTargeted 0 0 n/a% BenchmarkInsertTargetedSmallEpsilon 0 0 n/a% BenchmarkInsertBiased 0 0 n/a% BenchmarkInsertBiasedSmallEpsilon 0 0 n/a% BenchmarkQuery 0 0 n/a% BenchmarkQuerySmallEpsilon 0 0 n/a%
benchmark old bytes new bytes delta BenchmarkInsertTargeted 0 0 n/a% BenchmarkInsertTargetedSmallEpsilon 0 0 n/a% BenchmarkInsertBiased 0 0 n/a% BenchmarkInsertBiasedSmallEpsilon 0 0 n/a% BenchmarkQuery 0 0 n/a% BenchmarkQuerySmallEpsilon 0 0 n/a%
However, with many streams used at the same time, the go1.3 sync.Pool will save resources, as we only need one global instance (and not one per stream).
Hrm. I'm interested to know why things slow down.
Yeah, good question... The sync.Pool implementation looks pretty sophisticated. Perhaps the necessary type assertion causes that overhead?
Anyway, if you prefer speed over resource usage, we can just stick with the channel based pool (which exists once per stream instance and therefore consumes more memory).
The problem, as as pointed out by Dimtry (of golang) is that your copying the sample instead of using a pointer, which causes allocations and is slowing things down.
On Tuesday, June 24, 2014, Björn Rabenstein notifications@github.com wrote:
Yeah, good question... The sync.Pool implementation looks pretty sophisticated. Perhaps the necessary type assertion causes that overhead?
Anyway, if you prefer speed over resource usage, we can just stick with the channel based pool (which exists once per stream instance and therefore consumes more memory).
— Reply to this email directly or view it on GitHub https://github.com/bmizerany/perks/pull/12#issuecomment-46949861.
Could you help me and point out where my change introduces copying of a sample?
Unless -benchmem is mistaken, there are zero allocations happening after my change upon sample insertion. Where do you see an allocation?
Huh. He said he saw something there, but I'm not seeing it either.
Okay. Here is the status:
I learned that using anything larger than a WORD will cause a copy into and out of sync.Pool. So pointers are fine, but not structs with three fields.
That is good to know, but doesn't help us here. We can remove the allocs by changing s.l from []*Sample to []Sample. My first few attempts broke the tests. If you are curious and want to try yourself, please do. Even though the tests are broke, the benchmarks sped-up considerably and allocs when to 0.
[UPDATE]
This also removes the synchronization necessary for sync.Pool.
I pushed the changes here (again, they are broken): https://github.com/bmizerany/perks/tree/broken-but-fast
You can see here it removes the allocs:
$ go test -bench=BenchmarkInsertTargeted$ -run=none
PASS
BenchmarkInsertTargeted 20000000 118 ns/op 0 B/op 0 allocs/op
ok github.com/bmizerany/perks/quantile 2.502s
$
So, your change addresses the alloc problem in a different way, which is even faster in general. Nice.
In #13 , I suggested a fix for your branch. If that works, we can drop this PR.
Fixed in #13! Thank you!
With the current state, inserting is pretty heavy on the GC. On average, 1 alloc happens per insert. In situations where you want to handle millions of inserts in a short time, that's more or less prohibitive.
The Sample pool implemented here can be replaced by Go 1.3 pools once dependency on 1.3 is acceptable.
Benchmarks: