storage: investigate more aggressively sizing block cache

cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.

https://www.cockroachlabs.com

Other

30.11k stars 3.81k forks source link

storage: investigate more aggressively sizing block cache #133316

Open tbg opened 1 week ago

tbg commented 1 week ago

In sysbench oltp_read_write (10x10m rows), we see ~0.5% of cpu time in syscalls reading SSTs, presumably due to block cache misses. The dataset at this point is ~39gb per node (df -h /mnt/data1), which is not much larger than available memory. At the same time, actual mem usage on these nodes is just around 30-40%.

It stands to reason that performance could improve with a larger block cache. We should validate this and if so understand why we're not using more memory on this workload out of the box.

Jira issue: CRDB-43618

RaduBerinde commented 1 week ago

After a compaction happens, the first time we read a block from a new file, we read the data from the OS. Typically it will be served from the OS page cache.

https://github.com/cockroachdb/pebble/issues/2543 tracks improving this.

I also have an idea about copying entire data blocks when possible during compactions, and keeping them in cache if they were there already. I thought we had an issue tracking that but I can't seem to find it right now.

RaduBerinde commented 1 week ago

We can look at bandwidth metrics to see how much data we actually read from the disk.

tbg commented 1 week ago

ephemeral href

RaduBerinde commented 1 week ago

Thanks! So we are reading around 10MB/s.. Note that the OS should be using the free memory for the page cache; moving some of that memory to Pebble would actually decrease the total amount of cached data (the Pebble block cache stores uncompressed blocks).