Open zhenzou opened 11 months ago
Spoiler: it's actually a problem for theine as well. Since a key containing strings will select a random shard and hit ratio will be very small.
If you just need a hasher for comparable types, I can recommend https://github.com/dolthub/maphash that uses a set of dirty tricks to pull a hash function from a standard map
@zhenzou as @maypok86 pointed out, this is not related to validator. This is because current hash function of Theine calculate hash based on object's memory bytes. String is pointer to bytes so two strings have different hash even their value is same . I copy this from https://github.com/tidwall/hashmap because first version of Theine use that as internal map.
I think the best way to handle this is adding an optional Hash interface, instead of using the maphash package which hack Go's runtime a lot. Let me do some investigations first and decide how to handle this properly
@Yiling-J Yes, I don't believe it is issue of validator either. The strange thing is the issue will be disappears if remove the validate step. In my opinion, I create a new string every time, it should be different every time for current version even removed the validate step. I tried some escape demos, but none of them can reproduce the issue. Would you help to explain the detail for this. Thanks a lot.
func escape(obj any) {
if obj == nil {
panic("nil")
}
val := reflect.ValueOf(obj)
if val.Kind() != reflect.Struct {
panic("not struct")
}
}
@Yiling-J Hi, We are using theine in our new project, and this issue block us, would you provide a workaround for this. e.g. the add the interface to build a key to return a string, the hasher recognize if the key is a CacheKey.
type CacheKey interface{
BuildCacheKey() string
}
@zhenzou I do a quick fix on master branch so you are not blocked. I still need some investigations before release a new version. About your validator question, I think "123" is interning automatically by Go(in string()
method), and validator somehow break this and create a new string each time string()
is called.
new interface:
type StringKey interface {
StringKey() string
}
@zhenzou Actually there might be the case your key struct already has a StringKey method and do something else. So what i think is adding an extra method to builder, so the options are:
StringKey() string
interfaceStringKey() theine.Key
interface, in case StringKey method already exists and do something else
builder := theine.NewBuilder[Foo, string](1000)
builder.StringKey(func(key Foo) string { return key.name })
which one you think is better?
@Yiling-J You are correct, add a custom StringKey builder instead of StringKey interface is a better solution.
@Yiling-J I noted maphash because all other solutions will raise the same question, "What will you do if the key contains multiple strings?".
@zhenzou main branch updated, please use the new builder method:
builder := theine.NewBuilder[Foo, int](10000)
builder.StringKey(func(k Foo) string { return k.Bar })
@maypok86 seems the only special type as map key is string? because bytes is not allowed in comparable and pointers are always not equal. if so it's also possible to do a recursive check with reflect when hasher initialized, if there is string field and user doesn't provide a custom StringKey func(or maybe hash function), just panic.
@Yiling-J Eh, if only it were that simple...
Since Go 1.20 interfaces are comparable and this example will not pass:
package kek
import (
"fmt"
"github.com/dolthub/maphash"
"github.com/zeebo/xxh3"
"testing"
"unsafe"
)
type Hasher[K comparable] struct {
ksize int
kstr bool
}
func NewHasher[K comparable]() *Hasher[K] {
h := &Hasher[K]{}
var k K
switch ((interface{})(k)).(type) {
case string:
h.kstr = true
default:
h.ksize = int(unsafe.Sizeof(k))
}
return h
}
func (h *Hasher[K]) Hash(key K) uint64 {
var strKey string
if h.kstr {
strKey = *(*string)(unsafe.Pointer(&key))
} else {
strKey = *(*string)(unsafe.Pointer(&struct {
data unsafe.Pointer
len int
}{unsafe.Pointer(&key), h.ksize}))
}
return xxh3.HashString(strKey)
}
type LolKek struct {
lol any
kek any
}
func TestHasher(t *testing.T) {
lol := 236
kek := [34]byte{1, 2, 3}
hasher := NewHasher[LolKek]()
h1 := hasher.Hash(LolKek{
lol: lol,
kek: kek,
})
fmt.Println(h1)
h2 := hasher.Hash(LolKek{
lol: lol,
kek: kek,
})
fmt.Println(h2)
if h1 != h2 {
t.Fatal("hashes should be the same")
}
}
func TestMapHash(t *testing.T) {
lol := 236
kek := [34]byte{1, 2, 3}
hasher := maphash.NewHasher[LolKek]()
h1 := hasher.Hash(LolKek{
lol: lol,
kek: kek,
})
fmt.Println(h1)
h2 := hasher.Hash(LolKek{
lol: lol,
kek: kek,
})
fmt.Println(h2)
if h1 != h2 {
t.Fatal("hashes should be the same")
}
}
But it's more rare than this one (LolKek is just a composite key):
type LolKek struct {
LolID string
KekID string
}
If you're heading toward the route of needing StringKey
functions, you should probably stop insisting on comparable
. If I can make my StringKey
function return a random number, that comparable
didn't actually enforce the keys to be comparable.
I'd really prefer to see an API that doesn't force me to combine multiple []byte
into a newly-allocated single string -- especially multiple times! And my loading function wants to access the fields separately, so encoding them into a string would just mean I have to decode the string to fetch the value.
I haven't looked at the internals of theine to know if this is feasible, but one way around this would be to say that a cache key must implement Equals(other: K) bool
and Hash(sum *maphash.Hash)
(https://pkg.go.dev/hash/maphash). That way you can do e.g.
package main
import (
"bytes"
"encoding/binary"
"hash/maphash"
)
type cacheKey struct {
foo uint64
bar []byte
quux []byte
}
func (c cacheKey) Equals(other cacheKey) bool {
return c.foo == other.foo && bytes.Equal(c.bar, other.bar) && bytes.Equal(c.quux, other.quux)
}
func (c cacheKey) Hash(sum *maphash.Hash) {
var tmp [8]byte
binary.LittleEndian.PutUint64(tmp[:], c.foo)
sum.Write(tmp[:])
sum.Write(c.bar)
sum.Write(c.quux)
}
@tv42 I think comparable is the right key type, and if we consider cache a hashmap, it's nature to use comparable as key. Go knows exactly how to handle comparable, it just didn't export interface or useful functions. The best 2 options to me is: Go provide a generic hash function or Go make hashmap key type implementable, like Rust.
Currently I'm still thinking do a recursive reflect check when cache initilaized, if there is string and StringKey method is not implemented, just panic(any is valid in this check though, but when saving to real map, Go will panic). Another option is rewrite and avoid sharded map, because the hash value is only used to choose a shard.
Golang: 1.21.4
And it will be ok if remove the validate step.
It maybe not the problem of the theine, but would you like give some suggestions to solve this problem, Validator is a wild used library, many other uses maybe face the same issues.