facebook / rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.
http://rocksdb.org
GNU General Public License v2.0
28.71k stars 6.34k forks source link

Recursive read locks in port::RWMutex #13116

Open autopear opened 2 weeks ago

autopear commented 2 weeks ago

Existing port::RWMutex has different underlying implementations with different ports. The posix pthread_rwlock_t allows read-locking on the same thread for multiple times

A thread may hold multiple concurrent read locks on rwlock (that is, successfully call the pthread_rwlock_rdlock() function n times). If so, the thread must perform matching unlocks (that is, it must call the pthread_rwlock_unlock() function n times).

but this may not the case for other ports. For example, SRWLOCK in Windows does not support recursive read-locking

Shared mode SRW locks should not be acquired recursively as this can lead to deadlocks when combined with exclusive acquisition.

Also, the behavior of std::shared_mutex::lock_shared (though not used in RocksDB) is undefined if the same mutex is read-locked multiple times on the same thread.

If lock_shared is called by a thread that already owns the mutex in any mode (exclusive or shared), the behavior is undefined.

IMO, we should not assume recursive read (or write) locks of port::RWMutex is always allowed.

Currently there are some usages of port::RWMutex that may have recursive read-locks in the same thread. We have found this at least in WriteableCacheFile.

bool RandomAccessCacheFile::Read(const LBA& lba, Slice* key, Slice* val,
                                 char* scratch) override {
  ReadLock _(&rwlock_);

  assert(lba.cache_id_ == cache_id_);

  if (!freader_) {
    return false;
  }

  Slice result;
  Status s = freader_->Read(IOOptions(), lba.off_, lba.size_, &result, scratch,
                            nullptr, Env::IO_TOTAL /* rate_limiter_priority */);
  if (!s.ok()) {
    Error(log_, "Error reading from file %s. %s", Path().c_str(),
          s.ToString().c_str());
    return false;
  }

  assert(result.data() == scratch);

  return ParseRec(lba, key, val, scratch);
}

bool WriteableCacheFile::Read(const LBA& lba, Slice* key, Slice* block, char* scratch) override {
  ReadLock _(&rwlock_);
  const bool closed = eof_ && bufs_.empty();
  if (closed) {
    // the file is closed, read from disk
    return RandomAccessCacheFile::Read(lba, key, block, scratch);
  }
  // file is still being written, read from buffers
  return ReadBuffer(lba, key, block, scratch);
}

If the file has already been closed, rwlock_ is read-locked first in WriteableCacheFile::Read and then read-locked again in RandomAccessCacheFile::Read, this may cause deadlocks on some non-posix systems or alternative port implementations.