Open raichu opened 10 years ago
Do you mean slice overhead? It's not ideal it's true. It might be possible to use a single implementation to support both. I'll have a ponder.
I meant the map overhead. But right, there is the slice overhead as well, although it is not important in my case.
Along these lines, it makes sense to change the definition of CHDBuilder
to
type CHDBuilder struct {
keys []KeyType
values []ValueType
}
to save space. In fact, doing away with Entry completely might the best direction to go.
As for hashing, one can use unsafe (base pointer) and reflect (for the object size) to do the hashing regardless of the definition of ValueType (something that would utilize the underlying hash function with signature hash(ptr *unsafe.Pointer, size int) uint64
).
Also, I noticed that each call to chdHash
creates a new fnv64 hasher (which allocates). Is this really the correct behavior?
In fact, I checked the fnv source and it's so simple that writing your inline implementation fnv64([]byte) uint64
with no allocations should be trivial, without any need for the New64a
, Write
, Sum64
ceremony.
I'm definitely amenable to performance improvements. Changing the definition of CHDBuilder is a good idea.
First step might be some benchmarks.
Okay, I've implemented some of these changes and we're down from:
BenchmarkCHD 10000000 180 ns/op
to:
BenchmarkCHD 20000000 132 ns/op
I wanted this for ints too. I didn't see a clean way to integrate it into this package without completely cluttering the source code, so I made a fork: https://github.com/Jille/uint64mph
It seems mph essentially implements a
map[[]byte][]byte
. Is it possible to change the code such that it becomesmap[uint32][]uint8
? Go's bulit-in map has just too much overhead for my application and I was wondering if this can help.