facebook / rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.
http://rocksdb.org
GNU General Public License v2.0
28.09k stars 6.25k forks source link

anxiety about out of memory #12678

Closed 979357361 closed 2 months ago

979357361 commented 3 months ago

Hi everyone,

I want to use RocksDB with a GBs block cache in a server with many other processes, as far as known, RocksDB doesn't support using TLB huge page except memtable, so if other processes cosume too much memory, there is a risk of RocksDB OOM. I have some doubt about RocksDB memory:

  1. Get API will return a Slice, which in fact is {char + len}, so the memory region pointed by char is allocted when and how, and how to release it automatically? what is different in pinnable slice? I try to find in code, but TableReader is a little complex.
  2. If I understand correctly, reading hit SST (table) block will also make a copy of block (row_cache_entry) to be cached, the memory of row_cache_entry is allocated by string::reserve(), and will be freed once evicted from cache?
  3. If I set block cache size to 4GB, and before it grows to 4GB, there is no memory left in linux, what will happen for ongoing API calls?

Thanks for your time, all comments and insights are welcome, have a nice day to everyone.

ajkr commented 2 months ago

Right, only memtable uses huge pages. Blocks are small and allocated individually, so we don't use huge pages for those.

  1. PinnableSlice will pin the block in block cache until the PinnableSlice object is destroyed. That saves an allocation and a copy of the value. You can read more about it here - https://rocksdb.org/blog/2017/08/24/pinnableslice.html
  2. The scope of a row cache allocation is one KV record, not a block. Aside from that, I believe your description of its allocation pattern is correct.
  3. It's more of a Linux question. In my limited observations, a process seems to be killed in this situation to make room for the allocation, or prevent it in the first place. There could be other possibilities.
979357361 commented 2 months ago

Hi @ajkr, thanks for your response, I still have the following doubts:

  1. for normal Get API who returns a string, the memory is also allocted by string::reserve() or resize()?
  2. the scope of row cache alloction is one KV, but I used to think the unit of block cache is block (default 4kb) in SST, what is the difference between row cache and block cache, and by the way table cache?
  3. the block cache will not evict blocks even if the memory in system has exhausted, so if there is no memory left, ongoing Get() will cause RocksDB abort with "bad alloc"?
ajkr commented 2 months ago
  1. Some std::string function will allocate it. I don't think we make any promises there, and don't know why we would. Aside from the possibilities you mentioned, it could also be allocated by assign(): https://github.com/facebook/rocksdb/blob/d89ab23bec4c4d539f6c9377ef0eaa2163a849b3/include/rocksdb/db.h#L615
  2. Row cache is for KV records in tables. Row cache is disabled by default and is used rarely. When it's enabled, Get() will check for table records there before looking for them in the table reader object/blocks.

Table cache is for table reader objects. It is always present and its size is controlled by max_open_files setting.

Your understanding about block cache was correct: it caches blocks in SST files. Block cache is enabled by default but can be turned off.

  1. Maybe. I haven't seen it happen that way yet, but wouldn't be that surprised if it did.
979357361 commented 2 months ago

I see, thanks again @ajkr !