Closed moredure closed 3 years ago
@lemire what do you think?
I believe that the function we are concerned about is this one...
func baseHashes(data []byte) [4]uint64 {
a1 := []byte{1} // to grab another bit of data
hasher := murmur3.New128()
hasher.Write(data) // #nosec
v1, v2 := hasher.Sum128()
hasher.Write(a1) // #nosec
v3, v4 := hasher.Sum128()
return [4]uint64{
v1, v2, v3, v4,
}
}
This function is frequently called and likely a core bottleneck.
Why is it so complicated? Why not something like this... ?
func baseHashes(data []byte) (h1 uint64, h2 uint64, h3 uint64, h4 uint64) {
v1, v2 := murmur3.Sum128WithSeed(data, 0)
v3, v4 := murmur3.Sum128WithSeed(data, 1)
}
Other than that, I would be disappointed in Go if a1
and hasher
in the original function are not caught by escape analysis. It would be sad if a function-local variable ended up on the heap and being slated for collection at the end of the function. Something like go build -gcflags "-m"
should tell us what is happening. If it is indeed not caught by escape analysis, I'd like to investigate what is happening.
Note that sync.Pool
is never going to be as good as avoiding heap allocation entirely, which is what you'd like here. And sync.Pool
can certainly make things worse (it will defeat escape analysis, for example).
@lemire, at least hash pooling and utilising global a1 vs local makes no allocations. And yes slices almost always escapes to the heap.
func BenchmarkBloom(b *testing.B) {
b1 := bloom.New(1000, 23)
b2 := bloom2.New(1000, 23)
var x = make([]byte, 8)
b.Run("bloom", func(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
binary.BigEndian.PutUint64(x, uint64(i))
b1.Add(x)
}
})
b.Run("bloom-with-hash-pooling", func(b *testing.B) {
b.ReportAllocs()
for i := 0; i < b.N; i++ {
binary.BigEndian.PutUint64(x, uint64(i))
b2.Add(x)
}
})
}
cpu: Intel(R) Core(TM) i5-7600 CPU @ 3.50GHz
BenchmarkBloom
BenchmarkBloom/bloom
BenchmarkBloom/bloom-4 2708595 410.0 ns/op 97 B/op 2 allocs/op
BenchmarkBloom/bloom-with-hash-pooling
BenchmarkBloom/bloom-with-hash-pooling-4 3259140 374.1 ns/op 0 B/op 0 allocs/op
v1, v2 := murmur3.Sum128WithSeed(data, 0)
v3, v4 := murmur3.Sum128WithSeed(data, 1)
@lemire this looks pretty optimal, but certainly will break backward compatibility for serialised (binary on disk, etc) bloom filters, because of different baseHashes results for the same input data.
Speeds are close, if you are sure that it will not break backward compatibility I would choose hashing data twice with different seeds using Sum128WithSeed to get baseHashes as you proposed, otherwise better to leave my pull request unchanged and introduce pool-free version as breaking changes.
pu: Intel(R) Core(TM) i5-7600 CPU @ 3.50GHz
BenchmarkBloom
BenchmarkBloom/bloom
BenchmarkBloom/bloom-4 2959707 398.7 ns/op 97 B/op 2 allocs/op
BenchmarkBloom/bloom-with-hash-pooling
BenchmarkBloom/bloom-with-hash-pooling-4 3270631 347.8 ns/op 0 B/op 0 allocs/op
BenchmarkBloom/bloom-with-double-hashing
BenchmarkBloom/bloom-with-double-hashing-4 3802644 314.5 ns/op 0 B/op 0 allocs/op
Let me take a stab at it.
Note that the murmur library does not appear to be obviously little/big endian safe.
See https://github.com/willf/bloom/pull/58
It is basically almost the same as
func baseHashes(data []byte) (h1 uint64, h2 uint64, h3 uint64, h4 uint64) {
v1, v2 := murmur3.Sum128WithSeed(data, 0)
v3, v4 := murmur3.Sum128WithSeed(data, 1)
}
But it preserves hash values in a backward compatible manner. It requires a bit more code.
Hello @willf. Please check! Same as https://github.com/willf/bloom/pull/51 but less code, maybe not as fast as 51.