runtime: consider removing osyield call from lock2

The current runtime.lock2 implementation does a bit of spinning to try to acquire the runtime.mutex before sleeping. If a thread has gone to sleep within lock2 (via a syscall), it will eventually require another thread (in unlock2) to do another syscall to wake it. The bit of spinning allows us to avoid those syscalls in some cases. Slowing down a bit and trying again, at a high level, seems good; maybe the previous holder has exited the critical section.

The first phase of spinning involves a runtime.procyield call, which asks the processor to pause for a moment (on the scale of tens or hundreds of nanoseconds). There's some uncertainty about what that duration is and what it should be (described in part in #69232) but the idea of using this mechanism to slow down for a bit, again at a high level, seems good.

The second phase of spinning involves a runtime.osyield call. That's a syscall, implemented on Linux as a call to sched_yield(2). The discussion in CL 473656 links to https://www.realworldtech.com/forum/?threadid=189711&curpostid=189752 , which gives a perspective on why that's not a universally good idea.

It's a syscall, so it doesn't help with avoiding syscalls. (Though a single syscall here has a chance of avoiding a pair of syscalls, one to sleep indefinitely and one to wake).
The semantics aren't very well defined, and—very loosely speaking—don't align with our goals. We don't mean for the OS scheduler to drag a thread over from another NUMA node just because we said "we can't run at this instant".

Maybe we should delete that part of lock2. Or maybe we should replace it with an explicit nanosleep(2) call of some tiny time interval.

I don't see any urgency here. Mostly I'd like a tracking issue to reference in lock2's comments.

CC @golang/runtime

golang / go

runtime: consider removing osyield call from lock2 #69268