golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
122.88k stars 17.52k forks source link

proposal: sync/v2: prohibit unlocking mutex in a different goroutine #9201

Open dvyukov opened 9 years ago

dvyukov commented 9 years ago

sync.Mutex allows lock/unlock in different goroutines:

http://golang.org/pkg/sync/#Mutex.Unlock "A locked Mutex is not associated with a particular goroutine. It is allowed for one goroutine to lock a Mutex and then arrange for another goroutine to unlock it."

And the same for RWMutex: http://golang.org/pkg/sync/#RWMutex.Unlock

The possibility to unlock the mutex in a different goroutine is very rarely used in real code. And if you really need something more complex than lock/unlock in a single goroutine, you can always use channels.

But it creates several problems:

  1. Deadlock detection becomes impossible, as there is no notion of "critical sections".
  2. Similarly static lock annotations becomes impossible for the same reason.
  3. Optimizations like hardware lock elision (see e.g. Intel HLE) become impossible.
  4. Another potential optimization that becomes impossible is priority boosting inside of critical sections. Namely, if a goroutine is preempted inside of a critical sections, scheduler may give it another tiny time slot to finish the critical section (or perhaps it can then voluntarily switch in Unlock). Solaris did this for a long time.

We should prohibit possibility to unlock in a different goroutine in Go2.

davecheney commented 9 years ago

Comment 1:

For Go 1.x, could detecting unlocking from another goroutine be added as a runtime
experimental features, similar to the memory fence code that was added a while back ?
dvyukov commented 9 years ago

Comment 2:

Detect and then what?
davecheney commented 9 years ago

Comment 3:

Panic I guess. This would only be an experimental mode like the memory fence checker
dvyukov commented 9 years ago

Comment 4:

What's the profit here?
davecheney commented 9 years ago

Comment 5:

Its sounds like this is already a programming error, I would like an opportunity to
build a version of Go that detects this.
cznic commented 9 years ago

Comment 6:

Dmitry, I think that the idea is too close to what the situation is in some other
languages when they prohibit calling stuff from other than a specific/privileged thread.
IMHO, gouroutines having no implicit user-facing identity is an important property and I
suggest to keep its value.
dvyukov commented 9 years ago

Comment 7:

It's not a programming error today. The docs say:
http://golang.org/pkg/sync/#Mutex.Unlock
"A locked Mutex is not associated with a particular goroutine. It is allowed for one
goroutine to lock a Mutex and then arrange for another goroutine to unlock it."
Also it can be in third-party code. So even if you detect it, you may not be able to
change it.
dvyukov commented 9 years ago

Comment 8:

Re #6: what you say makes sense, but I think that benefits do not outweigh drawbacks in
this case. It does not look exactly the same as restricting all GUI calls to the main
thread. And we do not take this possibility away, more complex patterns are still
possible with channels. Mutex is for:
mu.Lock()
defer mu.Unlock
// something
ianlancetaylor commented 9 years ago

Comment 9:

It should be easy enough for the compiler and/or static analysis tools to see, for most
uses of mutexes, that the mutex is locked and unlocked in the same goroutine.  So I
believe that HLE is entirely implementable in the common case, which is the only case in
which it can be plausibly be used anyhow.
It should also be straightforward for the compiler to annotate lock/unlock in the same
goroutine for the benefit of dynamic analysis, if that seems useful.
Not that I'm necessarily opposed to this idea, but I think you need stronger arguments.

Labels changed: added repo-main, release-none.

dvyukov commented 9 years ago

Comment 10:

Is it really easy to do?
I see at least 3 problems:
1. If you do obj.subobj.mu.Lock/obj.subobj.mu.Unlock, compiler must prove that it's the
same mutex in both cases, that is, that obj.subobj pointer has not changed. Since these
objects are shared (they contain a mutex), it may be not that simple.
2. If compiler sees:
mu.Lock()
...
mu.Unlock()
and proves mu refers to the same object, it can be the case that the situation is
actually:
mu.Lock()
...
// pass responsibility to unlock to a different goroutine
...
// receive responsibility to unlock from yet another goroutine
...
mu.Unlock()
3. Non-trivial control flow like:
func foo() *T {
  t := ...
  ...
  if ... {
    t.mu.Lock()
  }
  ...
  return t
}
t := foo()
...
if ... {
  t.mu.Unlock()
}
...
ianlancetaylor commented 9 years ago

Comment 11:

If you can't resolve those issues anyhow, I don't understand how you can use HLE.  That
is, I don't see how the restriction that the mutex is manipulated from a single
goroutine is sufficient to use HLE in cases where you can't resolve the issues you
describe.
dvyukov commented 9 years ago

Comment 12:

There is nothing to resolve. With the new rules, there must be pairing lock/unlock. Even
if identity of obj.subobj.mu has changed in the function, there still must be a pairing
unlock for the original value of obj.subobj.mu and a pairing lock for the new value of
obj.subobj.mu somewhere else. For HLE and dynamic analysis you don't need to find pairs
statically, they will just manifest themselves at runtime.
ianlancetaylor commented 9 years ago

Comment 13:

I guess I don't grasp how that would work.  You lock a mutex presumably with an XACQUIRE
instruction.  You make a system call.  You block.  The scheduler runs.  You unblock. 
You come back in a different thread.  You unlock the mutex with an XRELEASE instruction.
 The unlock fails.  Execution resumes back at the XACQUIRE lock.  But you're running on
a different core.  Can that really work?
My point is that you need to know what happens between the XACQUIRE and the XRELEASE. 
Or so it seems to me.
dvyukov commented 9 years ago

Comment 14:

XACQUIRE/XRELEASE transparently fallback on real locking if transactions fail using some
heuristics.
If you use RTM (restricted transaction memory), then you fallback to locking manually
using explicit heuristics if you see something that you can't handle (e.g. a syscall).
But this happens at runtime, no prior static knowledge required.
Intel prototyped libpthread support for HLE where you could switch any C program that
uses pthread_mutex to HLE. And it worked.
ianlancetaylor commented 9 years ago

Comment 15:

OK, if that is how it works, then why can't we use the same heuristics to fallback if
the mutex unlock happens in a different goroutine?  By definition we must be on the same
core/thread, or we would have already done a fallback when we entered the scheduler.
(To be clear, I'm not saying you're wrong, I'm just saying that I don't understand.)
dvyukov commented 9 years ago

Comment 16:

Yes, we can fallback on normal locking with current mutex semantics. It will work.
But good heuristics are critical for efficient HLE. With the proposed new semantics we
permanently fall back to locking iff encounter something that we can't handle within
transaction (e.g. a syscall). But unlock in a different thread will look like normal
contention (it will be detected earlier when the goroutine passes responsibility to
unlock mutex to a different goroutine), so condition for permanent fallback to locking
becomes moot; and if we don't fallback (always try to execute transactionally several
times first), we will waste lots of work.
Static analysis don't have dynamic information, so it won't work for static analysis.
Dynamic analysis may or may not work, I don't know yet how to implement it.
dvyukov commented 9 years ago

Comment 17:

Yes, we can fallback on normal locking with current mutex semantics. It will work.
But good heuristics are critical for efficient HLE. With the proposed new semantics we
permanently fall back to locking iff encounter something that we can't handle within
transaction (e.g. a syscall). But unlock in a different thread will look like normal
contention (it will be detected earlier when the goroutine passes responsibility to
unlock mutex to a different goroutine), so condition for permanent fallback to locking
becomes moot; and if we don't fallback (always try to execute transactionally several
times first), we will waste lots of work.
Static analysis don't have dynamic information, so it won't work for static analysis.
Dynamic analysis may or may not work, I don't know yet how to implement it.
ghasemloo commented 7 years ago

@dvyukov, we had a case for passing locked objects around in C++, we didn't want the object to be unlocked while passed around. So there are cases where we might want to lock in one goroutine and unlock in another one I think. Feel free to email me for details.

dvyukov commented 7 years ago

@ghasemloo I believe there are cases like this, but it does not mean you have to use Mutex for them. chan will do just fine. Note: it is absolutely illegal to lock/unlock almost all C++ mutexes in different threads (e.g. pthread_mutex_lock, EnterCriticalSection, std::mutex, etc). And if you built an own semaphore and call it a mutex, you can do the same in Go as well.

pciet commented 6 years ago

Here's a solution to a deadlock case I had recently that requires unlocking in a different goroutine:

    // if the lock can't be acquired we have to try a read
    // here because the notifier could be holding it waiting to send
    //
    // this won't work serially: lock blocks us from 
    // trying to read and trying to read first gives 
    // the notifier a chance to lock before we do
    acq := make(chan struct{})
    go func() {
        gameMonitorsLock.Lock()
        acq <- struct{}{}
    }()
OUTER2:
    for {
        select {
        case <-channels.move:
        case <-channels.done:
        case <-acq:
            break OUTER2
        }
    }
    delete(gameMonitors, gameid)
    gameMonitorsLock.Unlock()

One of the senders called in an independent HTTP POST handling function:

go func() {
    gameMonitorsLock.RLock()
    c, has := gameMonitors[g.ID]
    if has {
        c.move <- time.Now()
    }
    gameMonitorsLock.RUnlock()
}()

Normally a send on c.move causes some reloads and checks of a changed database row, but in this case the race is where a move happens while a timeout detection (triggered by a timer instead of an HTTP request) is causing a teardown of the goroutine.

I don't know the quality of my design, but here's the one case I have where lock/unlock has to happen in separate goroutines.

dvyukov commented 6 years ago

@pciet no, it doesn't require unlocking a Mutex in a different goroutine. See the previous comment: https://github.com/golang/go/issues/9201#issuecomment-314651805

ianlancetaylor commented 6 years ago

@griesemer suggests that perhaps instead of changing the existing sync.Mutex type, we should introduce a different kind of construct. For example, we could add a new method to sync.Mutex, Critical(section func()), which evaluates the function with the mutex held. That would provide the guarantee you are looking for, and in some cases might be easier and safer for people to use than separate Lock and Unlock calls.

dvyukov commented 6 years ago

Unless switching to Go2 includes rewriting all of Go code from scratch (as far as I understand it doesn't), this won't give any significant benefit. If Lock/Unlock stay and provide the current semantics, we may not do this at all. The point is that 99% Mutex uses already comply.

dvyukov commented 6 years ago

Another potential optimization that becomes possible is priority boosting inside of critical sections. Namely, if a goroutine is preempted inside of a critical sections, scheduler may give it another tiny time slot to finish the critical section (or perhaps it can then voluntarily switch in Unlock). Solaris did this for a long time.

ianlancetaylor commented 6 years ago

If people convert over time to a Critical method, then it seems to me that all of your suggested improvements apply to uses of that method.

On the other hand, if we change the meaning of Lock and Unlock, then we break the 1% of programs that do not follow the new guidelines. That's only going to be OK if we can reliably statically detect the programs that will break, which as far as I can see we can't. So I don't see how we can implement this proposal as written. We can introduce a Critical method. We can introduce a new kind of Mutex that only permits locking and unlocking in the same goroutine. But I don't think we can change sync.Mutex. Or, to put it a different way, the cost of changing sync.Mutex is high; we need a correspondingly high benefit, and I don't see it here.

nathanjsweet commented 5 years ago

You could create your own trylock pretty simply: https://play.golang.org/p/DcITzWTNlrD