facebook / rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.
http://rocksdb.org
GNU General Public License v2.0
28.63k stars 6.33k forks source link

LRUCache strict_capacity=true #5048

Open toktarev opened 5 years ago

toktarev commented 5 years ago

We are not having any problems after I use rocksdb the way I described many times in this thread. Our read write load is very high and dB size is around terabytes. Still we are able to handle that with limited docker instance with 5gb ram limit including our Java app which is itself quite hungry. My guess is that rocksdb is taking max 2gb. All is running perfectly stable for months.

What I took as best for us:

  1. Set Max open files! Yes it is a must.
  2. Two level index.
  3. Shared and strictly limited caches.
  4. Jemalloc. It helps to keep memory stable. We have quite aggressive setup, but still no noticeable performance impact.
  5. Directio for sequence scans.
  6. Custom patches (Java specific)

All of these basically helps system and library. And it works perfectly. I use modified 5.12.5. Unfortunately patches that also helps a lot for high loads (Java direct buffers) were not merged. Otherwise jvm suffers from memory hot spots.

Anyway, rocksdb team, this is great work. I love this library.

Originally posted by @koldat in https://github.com/facebook/rocksdb/issues/4112#issuecomment-470269235

toktarev commented 5 years ago

@koldat , thanks a lot for such greate help Could you please explain how do you live with "strictly limited caches".

Throwing of s = Status::Incomplete("Insert failed due to LRU cache being full.");

in background processes like compaction and flush + failed writes and iterations/reads can lead to unpredictable system behaviour.

How do you handle this problem.

Thanks in advance.

koldat commented 5 years ago

I am not using pinning of index blocks thus cache should always have space as cache I use id also quite big (512 MB). In worst case it would throw and exception and application needs to handle that by its own.

I do not think that if background process would fail it will cause unpredictable behavior. Why do you think so? I guess the library should handle that as failed job and job will be scheduled again. This is my assumption and maybe more question to core developers.

Anyway I did not met this corner case.

toktarev commented 5 years ago

--Why do you think so because when we turned strict_capacity=true we saw tons of messages in our log with the following message: Insert failed due to LRU cache being full I am not sure if it is correct behaviour and what kind of problems it can cause

toktarev commented 5 years ago

---I am not using pinning of index blocks

which parameters do you use ?

I see such error on default BlockBasedTableOptions where pin_l0_filter_and_index_blocks_in_cache=false

koldat commented 5 years ago

I just scanned all our rocksdb LOGs we have and there is not a single message like "Insert failed due to LRU cache being full".

We are having defaults so pin_l0_filter_and_index_blocks_in_cache=false

What is size of your cache?

And I forgot to mention that for sequence scans we use: readOpts = new ReadOptions(); readOpts.setFillCache(false);

I also guess that functionality like open iterators, etc is pinning current pages into cache. Maybe you can check that as well.

toktarev commented 5 years ago

We see message "Insert failed due to LRU cache being full" in the application LOG not in RocksDB log

siying commented 5 years ago

@toktarev if you see this message, it pretty much means that if you turn strict mode to false, RocksDB will use more memory to cache blocks for capacity of block cache.

toktarev commented 5 years ago
    if (usage_ - lru_usage_ + charge > capacity_ &&
        (strict_capacity_limit_ || handle == nullptr)) {
      if (handle == nullptr) {
        // Don't insert the entry but still return ok, as if the entry inserted
        // into cache and get evicted immediately.
        last_reference_list.push_back(e);
      } else {
        delete[] reinterpret_cast<char*>(e);
        *handle = nullptr;
        s = Status::Incomplete("Insert failed due to LRU cache being full.");
      }
    } else {

as we can see from the code this message can be thrown only if strict_capacitylimit=true