cockroachdb / pebble

RocksDB/LevelDB inspired key-value database in Go
BSD 3-Clause "New" or "Revised" License
4.91k stars 455 forks source link

Question about block cache #2766

Open merlimat opened 1 year ago

merlimat commented 1 year ago

I'm trying a stress test with Pebble and I'm seeing most of cpu usage is spent in the get path and from the flame graph it looks like the blocks are loaded multiple times into the cache.

A couple of notes:

  1. The overall data size (1GB) is << than the configured block cache size (16GB)
  2. In the profile capture there are only read into the DB, no writes
  3. The capture is when already the block cache should be filled
  4. I can see 100% hit-rate on the block cache from the metrics
  5. These are single-key get operations (no iterator scans)
image

Am I understand this correctly that files are being loaded? Is there any recommended setting to address/improve these kind of scenarios?

Jira issue: PEBBLE-57

jbowens commented 1 year ago

Replying to your comment in the other issue:

@jbowens Could this non-usage of bloom filter in Get() be the reason behind the issue described in #2766?

Yes, the lack of use of bloom filters in Get can increase the CPU cost of reads.

it looks like the blocks are loaded multiple times into the cache.

This is not the case; these code paths are all retrieving blocks that are already in the cache from the cache. Consulting bloom filters during Get may help reduce the number block loads.