Open dvyukov opened 9 years ago
It's not a programming error today. The docs say: http://golang.org/pkg/sync/#Mutex.Unlock "A locked Mutex is not associated with a particular goroutine. It is allowed for one goroutine to lock a Mutex and then arrange for another goroutine to unlock it." Also it can be in third-party code. So even if you detect it, you may not be able to change it.
Re #6: what you say makes sense, but I think that benefits do not outweigh drawbacks in this case. It does not look exactly the same as restricting all GUI calls to the main thread. And we do not take this possibility away, more complex patterns are still possible with channels. Mutex is for: mu.Lock() defer mu.Unlock // something
It should be easy enough for the compiler and/or static analysis tools to see, for most uses of mutexes, that the mutex is locked and unlocked in the same goroutine. So I believe that HLE is entirely implementable in the common case, which is the only case in which it can be plausibly be used anyhow. It should also be straightforward for the compiler to annotate lock/unlock in the same goroutine for the benefit of dynamic analysis, if that seems useful. Not that I'm necessarily opposed to this idea, but I think you need stronger arguments.
Labels changed: added repo-main, release-none.
Is it really easy to do? I see at least 3 problems: 1. If you do obj.subobj.mu.Lock/obj.subobj.mu.Unlock, compiler must prove that it's the same mutex in both cases, that is, that obj.subobj pointer has not changed. Since these objects are shared (they contain a mutex), it may be not that simple. 2. If compiler sees: mu.Lock() ... mu.Unlock() and proves mu refers to the same object, it can be the case that the situation is actually: mu.Lock() ... // pass responsibility to unlock to a different goroutine ... // receive responsibility to unlock from yet another goroutine ... mu.Unlock() 3. Non-trivial control flow like: func foo() *T { t := ... ... if ... { t.mu.Lock() } ... return t } t := foo() ... if ... { t.mu.Unlock() } ...
There is nothing to resolve. With the new rules, there must be pairing lock/unlock. Even if identity of obj.subobj.mu has changed in the function, there still must be a pairing unlock for the original value of obj.subobj.mu and a pairing lock for the new value of obj.subobj.mu somewhere else. For HLE and dynamic analysis you don't need to find pairs statically, they will just manifest themselves at runtime.
I guess I don't grasp how that would work. You lock a mutex presumably with an XACQUIRE instruction. You make a system call. You block. The scheduler runs. You unblock. You come back in a different thread. You unlock the mutex with an XRELEASE instruction. The unlock fails. Execution resumes back at the XACQUIRE lock. But you're running on a different core. Can that really work? My point is that you need to know what happens between the XACQUIRE and the XRELEASE. Or so it seems to me.
XACQUIRE/XRELEASE transparently fallback on real locking if transactions fail using some heuristics. If you use RTM (restricted transaction memory), then you fallback to locking manually using explicit heuristics if you see something that you can't handle (e.g. a syscall). But this happens at runtime, no prior static knowledge required. Intel prototyped libpthread support for HLE where you could switch any C program that uses pthread_mutex to HLE. And it worked.
OK, if that is how it works, then why can't we use the same heuristics to fallback if the mutex unlock happens in a different goroutine? By definition we must be on the same core/thread, or we would have already done a fallback when we entered the scheduler. (To be clear, I'm not saying you're wrong, I'm just saying that I don't understand.)
Yes, we can fallback on normal locking with current mutex semantics. It will work. But good heuristics are critical for efficient HLE. With the proposed new semantics we permanently fall back to locking iff encounter something that we can't handle within transaction (e.g. a syscall). But unlock in a different thread will look like normal contention (it will be detected earlier when the goroutine passes responsibility to unlock mutex to a different goroutine), so condition for permanent fallback to locking becomes moot; and if we don't fallback (always try to execute transactionally several times first), we will waste lots of work. Static analysis don't have dynamic information, so it won't work for static analysis. Dynamic analysis may or may not work, I don't know yet how to implement it.
Yes, we can fallback on normal locking with current mutex semantics. It will work. But good heuristics are critical for efficient HLE. With the proposed new semantics we permanently fall back to locking iff encounter something that we can't handle within transaction (e.g. a syscall). But unlock in a different thread will look like normal contention (it will be detected earlier when the goroutine passes responsibility to unlock mutex to a different goroutine), so condition for permanent fallback to locking becomes moot; and if we don't fallback (always try to execute transactionally several times first), we will waste lots of work. Static analysis don't have dynamic information, so it won't work for static analysis. Dynamic analysis may or may not work, I don't know yet how to implement it.
@dvyukov, we had a case for passing locked objects around in C++, we didn't want the object to be unlocked while passed around. So there are cases where we might want to lock in one goroutine and unlock in another one I think. Feel free to email me for details.
@ghasemloo I believe there are cases like this, but it does not mean you have to use Mutex for them. chan will do just fine. Note: it is absolutely illegal to lock/unlock almost all C++ mutexes in different threads (e.g. pthread_mutex_lock, EnterCriticalSection, std::mutex, etc). And if you built an own semaphore and call it a mutex, you can do the same in Go as well.
Here's a solution to a deadlock case I had recently that requires unlocking in a different goroutine:
// if the lock can't be acquired we have to try a read
// here because the notifier could be holding it waiting to send
//
// this won't work serially: lock blocks us from
// trying to read and trying to read first gives
// the notifier a chance to lock before we do
acq := make(chan struct{})
go func() {
gameMonitorsLock.Lock()
acq <- struct{}{}
}()
OUTER2:
for {
select {
case <-channels.move:
case <-channels.done:
case <-acq:
break OUTER2
}
}
delete(gameMonitors, gameid)
gameMonitorsLock.Unlock()
One of the senders called in an independent HTTP POST handling function:
go func() {
gameMonitorsLock.RLock()
c, has := gameMonitors[g.ID]
if has {
c.move <- time.Now()
}
gameMonitorsLock.RUnlock()
}()
Normally a send on c.move causes some reloads and checks of a changed database row, but in this case the race is where a move happens while a timeout detection (triggered by a timer instead of an HTTP request) is causing a teardown of the goroutine.
I don't know the quality of my design, but here's the one case I have where lock/unlock has to happen in separate goroutines.
@pciet no, it doesn't require unlocking a Mutex in a different goroutine. See the previous comment: https://github.com/golang/go/issues/9201#issuecomment-314651805
@griesemer suggests that perhaps instead of changing the existing sync.Mutex
type, we should introduce a different kind of construct. For example, we could add a new method to sync.Mutex
, Critical(section func())
, which evaluates the function with the mutex held. That would provide the guarantee you are looking for, and in some cases might be easier and safer for people to use than separate Lock
and Unlock
calls.
Unless switching to Go2 includes rewriting all of Go code from scratch (as far as I understand it doesn't), this won't give any significant benefit. If Lock/Unlock stay and provide the current semantics, we may not do this at all. The point is that 99% Mutex uses already comply.
Another potential optimization that becomes possible is priority boosting inside of critical sections. Namely, if a goroutine is preempted inside of a critical sections, scheduler may give it another tiny time slot to finish the critical section (or perhaps it can then voluntarily switch in Unlock). Solaris did this for a long time.
If people convert over time to a Critical
method, then it seems to me that all of your suggested improvements apply to uses of that method.
On the other hand, if we change the meaning of Lock
and Unlock
, then we break the 1% of programs that do not follow the new guidelines. That's only going to be OK if we can reliably statically detect the programs that will break, which as far as I can see we can't. So I don't see how we can implement this proposal as written. We can introduce a Critical
method. We can introduce a new kind of Mutex
that only permits locking and unlocking in the same goroutine. But I don't think we can change sync.Mutex
. Or, to put it a different way, the cost of changing sync.Mutex
is high; we need a correspondingly high benefit, and I don't see it here.
You could create your own trylock
pretty simply:
https://play.golang.org/p/DcITzWTNlrD
sync.Mutex allows lock/unlock in different goroutines:
http://golang.org/pkg/sync/#Mutex.Unlock "A locked Mutex is not associated with a particular goroutine. It is allowed for one goroutine to lock a Mutex and then arrange for another goroutine to unlock it."
And the same for RWMutex: http://golang.org/pkg/sync/#RWMutex.Unlock
The possibility to unlock the mutex in a different goroutine is very rarely used in real code. And if you really need something more complex than lock/unlock in a single goroutine, you can always use channels.
But it creates several problems:
We should prohibit possibility to unlock in a different goroutine in Go2.