golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124.03k stars 17.67k forks source link

runtime: allow map hashes with different tradeoffs #36915

Open eric-s-raymond opened 4 years ago

eric-s-raymond commented 4 years ago

What version of Go are you using (go version)?

$ go version
go version go1.13.4 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
go version go1.13.4 linux/amd64
esr@snark:~/WWW/reposurgeon$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/esr/.cache/go-build"
GOENV="/home/esr/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/esr/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go-1.13"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go-1.13/pkg/tool/linux_amd64"
GCCGO="/usr/bin/gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/esr/WWW/reposurgeon/go.mod"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build890210982=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Wrote a complicated data transformation using maps heavily. Observed that after I did enough optimization, map hashing costs actually competed with GC as a performance bottleneck.

This is where the standard bug template stops being useful. The problem I see is that the choice of aeshash512 is more expensive than is right for my application, and I can't fix that.

I'm not arguing with the choice to optimize for cryptographic hardness over performance by default; given Golang's intended role as a language for network servers this was sensible. But my application - reposurgeon - doesn't need that hardening, and is paying a significant performance cost for it.

Therefore, this RFE: Offer a runtime switch that allows plugging in a weaker but faster hash.

randall77 commented 4 years ago

aeshash512

Where did 512 come from? That doesn't exist.

What do your keys look like (type, size, etc.)? Do you have a suggestion for an alternative, and how fast do you think it would be?

eric-s-raymond commented 4 years ago

randall77: I thought I saw that suffix in the profiles.

My keys are typically strings or 32-bit ints (in the latter case these numbers are revision IDs in a repository event sequence).

I'm not an expert on hash functions, so I hesitate to suggest a concrete alternative. What I want is lookup speed.

mvdan commented 4 years ago

Have you explored using a custom map implementation? For example, if you want to optimize for performance, and you know that the keys are 32-bit integers, I'm sure it's possible to beat map[int32]Value with a custom type. For example, see https://github.com/brentp/intintmap - though I'm not an expert either, and I haven't tried that particular package.

You'd have to replace the nice syntax with methods, but I think that's an OK tradeoff for the niche use cases where map isn't enough.

seebs commented 4 years ago

https://godoc.org/modernc.org/hash <-- this also exists

having seen the nightmares the Java people get from allowing users to specify hashes, I am moderately inclined to think that Go's choice of "don't let user specify a hash" is a good one for the innate map implementation.

networkimprov commented 4 years ago

Re "nightmares the Java people get," could you elaborate?

seebs commented 4 years ago

A while back Java started switching buckets to tree structures under load because it's fairly common for user-provided hash choices to be nigh-pessimal, producing a handful of buckets which end up with most of the items in them. I was told once that this may be in part because some IDE's prefilled template for a hash implementation provided a spectacularly poorly-considered suggested default hash, but that could be apocryphal. But generally, it turns out that if you allow people to pick hash functions, they do, and most of us are genuinely not qualified to pick hash functions, but also most of us don't realize this. Allowing a bit more specialization without actually letting users pick the function might work, but I'd be wary.

I also note that, at least in my experience, if I am seeing the map lookup get expensive, it's possible that I need to coalesce operations -- extract a value, do operations on it, put it back in, rather than writing a long sequence of operations using the map key. That's bitten me more than once.

randall77 commented 4 years ago

@networkimprov There is also the danger of writing a hash function and an equals function that aren't compatible (two equal values hash to different things). When that happens, it leads to very confusing behavior.

eric-s-raymond commented 4 years ago

seebs notifications@github.com:

But generally, it turns out that if you allow people to pick hash functions, they do, and most of us are genuinely not qualified to pick hash functions, but also most of us don't realize this. Allowing a bit more specialization without actually letting users pick the function might work, but I'd be wary.

I've already noted that I'm not an expert at hash functions. Last time I needed a custom one I talked Ron Rivest into designing one for me. :-)

I'd be quite happy with a choice among a small set of hash functions chosen by experts and marked, e,g. "use this for high speed without crypto-grade dispersion properties". -- Eric S. Raymond

beoran commented 4 years ago

Maybe this could be done by adding a third argument to make() for maps, where the third argument is a string with the description of the hashing function/strategy. The compiler errors if the third argument is unknown to it.

Then again, if we get Go generics, then a generic hash map library should solve this issue just a easily and perhaps with even better performance.

CAFxX commented 4 years ago

Maybe this could be done by adding a third argument to make() for maps, where the third argument is a string with the description of the hashing function/strategy. The compiler errors if the third argument is unknown to it.

I would rather avoid extra parameters, and have maps start with a faster/weaker hash: if then the map detects extremely skewed bucket overflows (that IIRC we already use to trigger map growth) it could switch to a slower hash with better confusion properties (like the current aeshash) during map growth.