cockroachdb / pebble

RocksDB/LevelDB inspired key-value database in Go
BSD 3-Clause "New" or "Revised" License
4.95k stars 458 forks source link

cache,db: de-dup concurrent attempts to read the same block #4157

Open sumeerbhola opened 2 weeks ago

sumeerbhola commented 2 weeks ago

Concurrent reads of the same block have been observed to cause very high memory usage, and are wasteful of disk bandwidth. We now coordinate across multiple concurrent attempts to read the same block via a readEntry, which makes the readers take turns until one succeeds.

The readEntries are embedded in a map that is part of a readShard, where there is a readShard for each cache.Shard. See the long comment in the readShard declaration for motivation. The Options.LoadBlockSema is integrated into the readEntry, to simplify the waiting logic in the caller.

Callers interact with this new behavior via Cache.GetWithReadHandle, which is only for callers that intend to do a read and then populate the cache. If this method returns a ReadHandle, the caller must first wait for permission to do a read. See the ReadHandle comment for details of the contract.

Fixes #4138

cockroach-teamcity commented 2 weeks ago

This change is Reviewable

RaduBerinde commented 1 week ago

Should we make the load block semaphore shared across DBs, like the block cache? The primary motivation was preventing excessive memory utilization from many concurrent reads, right? And memory is a shared resource across DBs.

That would have been easier to implement, but there was concern that one store that is broken and has very slow IOs will block out all other stores. https://github.com/cockroachdb/cockroach/blob/3644f0d3fe77c03c20d3b603c1c0eb6f335e7e15/pkg/storage/pebble.go#L997