Reduce allocations - Githubissues

moredure commented 3 years ago

Hello @willf. Please check! Same as https://github.com/willf/bloom/pull/51 but less code, maybe not as fast as 51.

willf commented 3 years ago

@lemire what do you think?

lemire commented 3 years ago

I believe that the function we are concerned about is this one...

func baseHashes(data []byte) [4]uint64 {
    a1 := []byte{1} // to grab another bit of data
    hasher := murmur3.New128()
    hasher.Write(data) // #nosec
    v1, v2 := hasher.Sum128()
    hasher.Write(a1) // #nosec
    v3, v4 := hasher.Sum128()
    return [4]uint64{
        v1, v2, v3, v4,
    }
}

This function is frequently called and likely a core bottleneck.

Why is it so complicated? Why not something like this... ?

func baseHashes(data []byte)  (h1 uint64, h2 uint64, h3 uint64, h4 uint64) {
  v1, v2 := murmur3.Sum128WithSeed(data, 0)
  v3, v4 := murmur3.Sum128WithSeed(data, 1)
}

Other than that, I would be disappointed in Go if a1 and hasher in the original function are not caught by escape analysis. It would be sad if a function-local variable ended up on the heap and being slated for collection at the end of the function. Something like go build -gcflags "-m" should tell us what is happening. If it is indeed not caught by escape analysis, I'd like to investigate what is happening.

Note that sync.Pool is never going to be as good as avoiding heap allocation entirely, which is what you'd like here. And sync.Pool can certainly make things worse (it will defeat escape analysis, for example).

moredure commented 3 years ago

@lemire, at least hash pooling and utilising global a1 vs local makes no allocations. And yes slices almost always escapes to the heap.

func BenchmarkBloom(b *testing.B) {
    b1 := bloom.New(1000, 23)
    b2 := bloom2.New(1000, 23)
    var x = make([]byte, 8)
    b.Run("bloom", func(b *testing.B) {
        b.ReportAllocs()
        for i := 0; i < b.N; i++ {
            binary.BigEndian.PutUint64(x, uint64(i))
            b1.Add(x)
        }
    })
    b.Run("bloom-with-hash-pooling", func(b *testing.B) {
        b.ReportAllocs()
        for i := 0; i < b.N; i++ {
            binary.BigEndian.PutUint64(x, uint64(i))
            b2.Add(x)
        }
    })
}

cpu: Intel(R) Core(TM) i5-7600 CPU @ 3.50GHz
BenchmarkBloom
BenchmarkBloom/bloom
BenchmarkBloom/bloom-4           2708595           410.0 ns/op        97 B/op          2 allocs/op
BenchmarkBloom/bloom-with-hash-pooling
BenchmarkBloom/bloom-with-hash-pooling-4             3259140           374.1 ns/op         0 B/op          0 allocs/op

v1, v2 := murmur3.Sum128WithSeed(data, 0)
v3, v4 := murmur3.Sum128WithSeed(data, 1)

@lemire this looks pretty optimal, but certainly will break backward compatibility for serialised (binary on disk, etc) bloom filters, because of different baseHashes results for the same input data.

Speeds are close, if you are sure that it will not break backward compatibility I would choose hashing data twice with different seeds using Sum128WithSeed to get baseHashes as you proposed, otherwise better to leave my pull request unchanged and introduce pool-free version as breaking changes.

pu: Intel(R) Core(TM) i5-7600 CPU @ 3.50GHz
BenchmarkBloom
BenchmarkBloom/bloom
BenchmarkBloom/bloom-4           2959707           398.7 ns/op        97 B/op          2 allocs/op
BenchmarkBloom/bloom-with-hash-pooling
BenchmarkBloom/bloom-with-hash-pooling-4             3270631           347.8 ns/op         0 B/op          0 allocs/op
BenchmarkBloom/bloom-with-double-hashing
BenchmarkBloom/bloom-with-double-hashing-4           3802644           314.5 ns/op         0 B/op          0 allocs/op

lemire commented 3 years ago

Let me take a stab at it.

lemire commented 3 years ago

Note that the murmur library does not appear to be obviously little/big endian safe.

lemire commented 3 years ago

See https://github.com/willf/bloom/pull/58

It is basically almost the same as

func baseHashes(data []byte)  (h1 uint64, h2 uint64, h3 uint64, h4 uint64) {
  v1, v2 := murmur3.Sum128WithSeed(data, 0)
  v3, v4 := murmur3.Sum128WithSeed(data, 1)
}

But it preserves hash values in a backward compatible manner. It requires a bit more code.

bits-and-blooms / bloom

Reduce allocations #57