Open valyala opened 3 months ago
Related Issues and Documentation
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Sounds like a dup of #8281.
This idea is interesting, but I think it stops just short of being able to cover many more use-cases, without much of an increase in complexity.
For example, with a hook into where two values are in conflict (similar to the Merge
method here: https://github.com/golang/go/issues/18802#issuecomment-1884012504), and iteration over all the values, we can support additional use-cases like scalable counters and logging (locally cache buffers, and on conflict, just flush one of them).
Some other notes:
P
part of any public API; it's a concept that's internal to the runtime. I don't have a better name off the top of my head, but I'm sure there's a good one out there.any
your type parameter if that's what you want.I think this proposal is closely related to #18802 actually, and not far from where I wanted to explore next with being able to access local values without synchronization.
For example, with a hook into where two values are in conflict (similar to the Merge method here: https://github.com/golang/go/issues/18802#issuecomment-1884012504), and iteration over all the values, we can support additional use-cases like scalable counters and logging (locally cache buffers, and on conflict, just flush one of them).
I'd prefer leaving p-local cache as simple as possible, so it solves the original issue described above - to provide an efficient and cpu-scalable mechanism for caching the state of various CPU-bound parsers and encoders. It is expected that the state may be lost at any time (for example, when GOMAXPROCS changes, when the goroutine is re-scheduled to other P or when somebody forgets returning the state to the cache), so it could be re-constructed when needed.
Other use cases like https://github.com/golang/go/issues/8281 and https://github.com/golang/go/issues/18802 should be covered separately, since they are more complicated and they have no clear solution yet.
Proposal Details
The issue
High-performance code which scales linearly with the number of CPU cores usually needs per-CPU caches for holding some per-CPU state in order to avoid costly inter-CPU synchronization. The state can be re-computed at any time, but the computation may take additional CPU time and other resources, so it is more efficient to cache the computed state per each CPU core and then re-use it.
The
sync.Pool
can be used as per-CPU cache, but it has the following issues in this context:sync.Pool.Get()
tries stealing an object from other CPUs if the object is missing in the current P. This leads to costly inter-CPU synchronization. The cost of this synchronization increases with the number of available CPU cores.sync.Pool.Put()
may store multiple objects at the same P. This leads to excess memory usage when at most one object is needed per P.sync.Pool.Put()
also triggers expensive inter-CPU synchronization if P already contains an object.sync.Pool
may drop cached objects at every GC cycle, so the caller needs to spend additional CPU time for re-creating the object.The solution
To add
sync.PLocalCache
struct with the following API:Implementation details
sync.PLocalCache
may be implemented in the way similar tosync.Pool
, but without the following abilities:Put()
is called on the storage with already existing P-local object, then just ignore the new object.sync.PLocalCache
doesn't exceedGOMAXPROCS
, e.g. it is bounded, and it is expected that the user continuously accesses the cached objects. So there is little sense in periodic cleanup of the cache. All the cached objects will be removed after the correspondingsync.PLocalCache
is destroyed by garbage collector.The property of having at most one P-local object in the cache narrows down the applicability of the
Get() ... Put()
to CPU-bound code without context switches (e.g. without IO, expensive syscalls and CGO calls). This minimizes chances of context switch during the execution of the code betweenGet()
andPut()
, so the cached objects will be successfully re-used by this code. For example, it is great to usesync.PLocalCache
for scalable random number generator with per-P (aka per-CPU) state. It is also great to usesync.PLocalCache
for various CPU-bound parsers, encoders and compressors with some non-trivial state, which can be cached in the P-local cache.On the other hand, if the chances of context switch between
Get()
andPut()
calls are high, then this increases chances thatGet()
will returnnil
most of the time. This forces the user's code to spend additional CPU time on object re-creation. The re-created object will be dropped most of the time onPut()
call, since there are high chances that there is another P-local object is put in the cache by concurrently running goroutines. In such cases it is better to usesync.Pool
instead ofsync.PLocalCache
.Example usage
See also https://github.com/golang/go/issues/65104 . Now I think it is better to provide a separate entity with clear semantics than complicating the semantics of
sync.Pool
and/or trying to efficiently cover multiple different cases withsync.Pool
.Generics
It may be good providing generic-based
sync.PLocalCache[T any]
, but it is OK to provide non-generic implementation to be consistent withsync.Pool
.