golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
124k stars 17.67k forks source link

sync: reduce contention between Map operations with new-but-disjoint keys #21035

Open bcmills opened 7 years ago

bcmills commented 7 years ago

The Go 1.9 implementation of sync.Map uses a single Mutex to guard the read-write map containing new keys. That makes Store calls with different new keys always contend with each other, and also contend with Load calls with different new keys, even if the Loads and Stores for each key are confined to a single thread.

That doesn't really matter for the sync.Map use-cases in the standard library because they do not operate on new keys in the steady state, but it limits the utility of sync.Map for use-cases involving a high rate of churn. Such use-cases may include:

We should explore ways to address new-key contention, such as sharding the read-write maps and associated locks (as suggested in https://github.com/golang/go/issues/20360), journaling writes (and using a Bloom or HyperLogLog filter to avoid reading the journal, along the lines of https://github.com/golang/go/issues/21032), or storing the read-write map in an atomic tree data structure instead of a built-in map.

bcmills commented 7 years ago

(@orcaman and/or @OneOfOne might be interested in this one?)

OneOfOne commented 7 years ago

I'm interested but I can't commit right now, the one idea that comes to my mind would require hacking runtime/hashmap* otherwise we'd have to double hash the keys.

orcaman commented 7 years ago

Very interesting task, I'd love to take this one. I think it should definitely be doable for the Go1.10 Milestone. I'll think about the design some more before I can 100% commit to doing this one in due time.

OneOfOne commented 7 years ago

@orcaman What I wanted to do is something like https://github.com/OneOfOne/cmap/tree/master/v2,

I use a slightly modified version of this in my code, but I wanted to do that with sync.Map in a way, but that'd require a lot more runtime knowledge to be able to use runtime/map's hasher directly to do the sharding rather than double hash it.

Sadly I don't have the knowledge nor time to learn the internals of runtime/map right now.

By all means if you can do that, it'd be great or we can discuss it.

// very simple set/get benchmark using cmap, rwmutex map and sync.Map.

BenchmarkCMap/128-8     20000000               105 ns/op               0 B/op          0 allocs/op
BenchmarkCMap/256-8     20000000                88.6 ns/op             0 B/op          0 allocs/op
BenchmarkCMap/512-8     20000000                67.6 ns/op             0 B/op          0 allocs/op
BenchmarkCMap/1024-8    30000000                42.3 ns/op             0 B/op          0 allocs/op
BenchmarkCMap/2048-8    30000000                34.5 ns/op             0 B/op          0 allocs/op
BenchmarkCMap/4096-8    50000000                33.0 ns/op             0 B/op          0 allocs/op
BenchmarkMutexMap-8     10000000               196 ns/op               0 B/op          0 allocs/op
BenchmarkSyncMap-8      30000000                39.3 ns/op            16 B/op          1 allocs/op
robaho commented 5 years ago

I think you can just use a slice of locks and dirty maps with the low order bits of the hash performing the selection - but this requires access to the internal hash code.

robaho commented 5 years ago

@bcmills I am willing to give this a try. Is there a document that desribes using internal facilities when an implementation is part of the stdlib?

bcmills commented 5 years ago

I don't know of any good starting docs, but perhaps @randall77 or @aclements can point you in the right direction.

mojinfu commented 5 years ago

really a good idea

withinboredom commented 2 months ago

Someone on Reddit pointed me to this discussion.

I was dealing with an issue related to sync.Map contention and worked on it in the context of FrankenPHP cgo handles.

Here was my CL to deal with cgo handles: https://go-review.googlesource.com/c/go/+/600875

I also wrote a blog post that shows the difference in regards to contention (with benchmarks): https://withinboredom.info/2024/08/12/optimizing-cgo-handles/

I ended up going with a growable slice and freelist approach. It's pretty darn fast.

benchmarks