golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.98k stars 17.67k forks source link

runtime: correct attribution of internal mutex contention #66999

Open rhysh opened 6 months ago

rhysh commented 6 months ago

As of Go 1.22, the mutex profile measures contention on internal mutex values, but attributes the contention to the function runtime._LostContendedRuntimeLock. This should be improved.

When running with GODEBUG=runtimecontentionstacks=1, the mutex profile attributes to the caller of runtime.unlock all the delay that the M encountered during its preceding runtime.lock call to obtain that lock. This is incorrect: the delay attributed to a runtime.unlock call should instead correspond to the total time that the other Ms were unable to proceed because this M was holding the lock.

I have the start of a plan to resolve this, so I'm seeking feedback from @mknyszek and @prattmic (or @golang/runtime or others) on whether I'm on the right track — both in the start of the plan, and on whether I'm working to the right set of constraints. Thank you!


The mutex profile for sync.Mutex uses fields on the sudog to track how many Gs have been waiting and for about how long. The sudogs form a linked list of waiters, and are dequeued under the control of the runtime. When waking a G, the caller of sync.Mutex.Unlock is able to claim responsibility for the delay that it specifically caused, and update the remaining waiters' start time so the next Unlocker can claim the right amount too. We'll need similar information for the Ms that hold runtime-internal locks.

I'd like to say a few assumed constraints out loud:

For OSes where we use lock_sema.go, an M that needs to block on a runtime.mutex first pushes itself onto the head of a linked list (runtime.mutex.key, linked via runtime.m.nextwaitm), and then sleeps on its own private semaphore. When the current holder of the lock is done, it pops an M off of the list and wakes that specific M. We could add fields to the M for the number of Ms after it on the list, for the time it went to sleep, and for the time that the tail M went to sleep (or a pointer to the tail M). I think that would be sufficient information to match the approximation we do for sync.Mutex via sudog, and would need only a simple set of invariants to update when pushing and popping Ms to/from this list.


For OSes such as Linux where we use lock_futex.go, the kernel-controlled wakeup order means we'll need something more.

This uses space on the waiting Ms to store the necessary information; they're not using it for anything else while they're blocked.

I'll prepare a CL to implement this plan, but if I've gotten some assumptions wrong or it's otherwise clear that this won't work, it may be easier to spot and to discuss it here rather than in the details of code review. Thanks!

prattmic commented 6 months ago

At a glance, this plan seems reasonable to me.

One note on assumed constraints:

We cannot change the size of a runtime.mutex, it needs to remain uintptr.

While it is nice to avoid increasing the size of a mutex, I don't think this needs to be a hard requirement. Taking a look at where we use mutex, it looks like timers and channels probably have the most objects and would thus have the biggest impact. But neither of those structures is tiny, so I don't think it would be the end of the world if they get bigger.

gopherbot commented 5 months ago

Change https://go.dev/cl/585635 mentions this issue: runtime: prepare for extensions to waiting M list

gopherbot commented 5 months ago

Change https://go.dev/cl/585639 mentions this issue: runtime: remove GODEBUG=runtimecontentionstacks

gopherbot commented 5 months ago

Change https://go.dev/cl/585638 mentions this issue: runtime: profile mutex contention during unlock

gopherbot commented 5 months ago

Change https://go.dev/cl/585636 mentions this issue: runtime: use semaphore structure for futex locks

gopherbot commented 5 months ago

Change https://go.dev/cl/585637 mentions this issue: runtime: double-link list of waiting Ms

gopherbot commented 5 months ago

Change https://go.dev/cl/586237 mentions this issue: runtime: improve runtime-internal mutex profile tests

gopherbot commented 5 months ago

Change https://go.dev/cl/587175 mentions this issue: runtime/pprof: shorten TestGoroutineSwitch

gopherbot commented 5 months ago

Change https://go.dev/cl/586796 mentions this issue: runtime: split mutex profile clocks

gopherbot commented 5 months ago

Change https://go.dev/cl/589096 mentions this issue: Revert "runtime: improve runtime-internal mutex profile tests"

gopherbot commented 5 months ago

Change https://go.dev/cl/589116 mentions this issue: Revert "runtime: double-link list of waiting Ms"

gopherbot commented 5 months ago

Change https://go.dev/cl/589098 mentions this issue: Revert "runtime: use semaphore structure for futex locks"

gopherbot commented 5 months ago

Change https://go.dev/cl/589097 mentions this issue: Revert "runtime: remove GODEBUG=runtimecontentionstacks"

gopherbot commented 5 months ago

Change https://go.dev/cl/589115 mentions this issue: Revert "runtime: profile mutex contention during unlock"

gopherbot commented 5 months ago

Change https://go.dev/cl/589117 mentions this issue: Revert "runtime: prepare for extensions to waiting M list"

gopherbot commented 5 months ago

Change https://go.dev/cl/589095 mentions this issue: Revert "runtime: split mutex profile clocks"

rhysh commented 5 months ago

Reverted for the Go 1.23 cycle due to #67585

chabbimilind commented 5 months ago

This direction seems right. We should blame the lock wait time to the wait holder (the concept is called blame shifting). It should be easier in Go since there is already a list of lock-waiters in the runtime. It was done via sampling profilers: https://www.cs.rice.edu/~johnmc/papers/hpctoolkit-ppopp-2010.pdf

It is also a thing in OpenMP runtime tools API: https://www.openmp.org/wp-content/uploads/OMPT-SC13-BOF.pdf

robaho commented 3 months ago

Hmm. The blame shifting seems less useful than knowing what was blocked where and for how long. Ideally it seems it would report both - what was blocked for how long and what the lock holder was doing - but this feels like nested profiling which is pretty hard to follow.

rhysh commented 3 months ago

Yes, the semantics of the "mutex" profile are to shift blame to the slow critical section. Part of why this didn't make it for the Go 1.23 cycle is that we only have the explicit list of lock waiters for the lock_sema.go platforms, and the extra bookkeeping required to add it for GOOS=linux was an unwelcome slowdown.

It is also useful to know what was blocked and for how long—for blocking that takes place outside of the runtime, that's reported on the "block" profile. Do you have an example of where that would have been useful for runtime-internal blocking, @robaho (maybe an open issue that is/was hard to debug)? If so, I think the next step would be to open a new issue to discuss adding it. I think it's possible, but we'd need to get consensus on whether it's worth the costs.

Fixing this didn't happen for Go 1.23, I'm planning to try again for Go 1.24.

gopherbot commented 1 week ago

Change https://go.dev/cl/622997 mentions this issue: runtime: unify lock2, allow deeper sleep, control tail