The current runtime.lock2 implementation does a bit of spinning to try to acquire the runtime.mutex before sleeping. If a thread has gone to sleep within lock2 (via a syscall), it will eventually require another thread (in unlock2) to do another syscall to wake it. The bit of spinning allows us to avoid those syscalls in some cases. Slowing down a bit and trying again, at a high level, seems good; maybe the previous holder has exited the critical section.
The first phase of spinning involves a runtime.procyield call, which asks the processor to pause for a moment (on the scale of tens or hundreds of nanoseconds). There's some uncertainty about what that duration is and what it should be (described in part in #69232) but the idea of using this mechanism to slow down for a bit, again at a high level, seems good.
It's a syscall, so it doesn't help with avoiding syscalls. (Though a single syscall here has a chance of avoiding a pair of syscalls, one to sleep indefinitely and one to wake).
The semantics aren't very well defined, and—very loosely speaking—don't align with our goals. We don't mean for the OS scheduler to drag a thread over from another NUMA node just because we said "we can't run at this instant".
Maybe we should delete that part of lock2. Or maybe we should replace it with an explicit nanosleep(2) call of some tiny time interval.
I don't see any urgency here. Mostly I'd like a tracking issue to reference in lock2's comments.
The current
runtime.lock2
implementation does a bit of spinning to try to acquire theruntime.mutex
before sleeping. If a thread has gone to sleep withinlock2
(via a syscall), it will eventually require another thread (inunlock2
) to do another syscall to wake it. The bit of spinning allows us to avoid those syscalls in some cases. Slowing down a bit and trying again, at a high level, seems good; maybe the previous holder has exited the critical section.The first phase of spinning involves a
runtime.procyield
call, which asks the processor to pause for a moment (on the scale of tens or hundreds of nanoseconds). There's some uncertainty about what that duration is and what it should be (described in part in #69232) but the idea of using this mechanism to slow down for a bit, again at a high level, seems good.The second phase of spinning involves a
runtime.osyield
call. That's a syscall, implemented on Linux as a call tosched_yield(2)
. The discussion in CL 473656 links to https://www.realworldtech.com/forum/?threadid=189711&curpostid=189752 , which gives a perspective on why that's not a universally good idea.Maybe we should delete that part of
lock2
. Or maybe we should replace it with an explicitnanosleep(2)
call of some tiny time interval.I don't see any urgency here. Mostly I'd like a tracking issue to reference in lock2's comments.
CC @golang/runtime