Open knz opened 2 years ago
I read the code and I think think it's more or less sound. I don't think it can deadlock. All of the RLock callers will either terminate or unlock promptly on a Wait
call. The atomicity here is not important. I don't think there's a bug here.
The atomicity here is not important.
Can you say more? Why? If it's not important, why do we care about acquiring a read lock at all?
We're playing this fun game here where the writes are being performed concurrently by using an RLock and an atomic add to pick an index. This works because that atomic add is monotonic. The flush operation needs to be operated under full mutual exclusion, and the Lock
gives us that because the other writers will use RLock
. As long as a thread has the Lock
it knows that it is safe to flush. The race in question is whether two goroutines might Flush
and by the time the first goroutine flushes, the second one doesn't need to. If we had atomicity, then, I suppose, you might avoid an extra flush. This data structure has some extra synchronization by using that atomic: only the goroutine which increments the buffer size to a specific number will attempt to perform the flush. The only case where the race of two flushes can occur is if the ForceSync
method is called, at which point, I don't think it matters if there's one extra flush.
Ok that is interesting.
Why is this explanation not included in the source? Especially on the struct description, and the particular method at hand here.
@matthewtodd can I ask you to sprinkle commentary on how the ConcurrentBufferGuard
works and why the re-locking is safe as the resolution to this issue?
Yes, I'll add commentary then resolve this issue when done. Thanks, all.
Describe the problem
In
sql/contention/contentionutils/concurrent_buffer_guard.go
we see the following pattern:The call to RUnlock() is followed by Lock(). In between the two the mutex is not held. This
syncRLocked()
function is called fromAtomicWrite
, which (as the name implies) is meant to do its operation atomically. Because of the pattern insyncRLocked()
, the atomicity is lost.The result is that contention information may be lost, or the execution could deadlock.
Expected behavior
The logic in ConcurrentBufferGuard should be rewritten to avoid this loss of atomicity, possibly using a channel instead.
Jira issue: CRDB-16850