alecthomas / mph

Minimal Perfect Hashing for Go
BSD 3-Clause "New" or "Revised" License
172 stars 23 forks source link

Integer key and value #2

Open raichu opened 10 years ago

raichu commented 10 years ago

It seems mph essentially implements a map[[]byte][]byte. Is it possible to change the code such that it becomes map[uint32][]uint8? Go's bulit-in map has just too much overhead for my application and I was wondering if this can help.

alecthomas commented 10 years ago

Do you mean slice overhead? It's not ideal it's true. It might be possible to use a single implementation to support both. I'll have a ponder.

raichu commented 10 years ago

I meant the map overhead. But right, there is the slice overhead as well, although it is not important in my case.

raichu commented 10 years ago

Along these lines, it makes sense to change the definition of CHDBuilder to

type CHDBuilder struct {
    keys []KeyType
    values []ValueType
}

to save space. In fact, doing away with Entry completely might the best direction to go.

As for hashing, one can use unsafe (base pointer) and reflect (for the object size) to do the hashing regardless of the definition of ValueType (something that would utilize the underlying hash function with signature hash(ptr *unsafe.Pointer, size int) uint64).

raichu commented 10 years ago

Also, I noticed that each call to chdHash creates a new fnv64 hasher (which allocates). Is this really the correct behavior?

In fact, I checked the fnv source and it's so simple that writing your inline implementation fnv64([]byte) uint64 with no allocations should be trivial, without any need for the New64a, Write, Sum64 ceremony.

alecthomas commented 10 years ago

I'm definitely amenable to performance improvements. Changing the definition of CHDBuilder is a good idea.

First step might be some benchmarks.

alecthomas commented 10 years ago

Okay, I've implemented some of these changes and we're down from:

BenchmarkCHD    10000000           180 ns/op

to:

BenchmarkCHD    20000000           132 ns/op
Jille commented 1 year ago

I wanted this for ints too. I didn't see a clean way to integrate it into this package without completely cluttering the source code, so I made a fork: https://github.com/Jille/uint64mph