apache / kvrocks

Apache Kvrocks is a distributed key value NoSQL database that uses RocksDB as storage engine and is compatible with Redis protocol.
https://kvrocks.apache.org/
Apache License 2.0
3.44k stars 442 forks source link

Memory limits on connections #2284

Open caipengbo opened 4 months ago

caipengbo commented 4 months ago

Search before asking

Motivation

The lack of memory constraints in current Kvrocks can lead to OOM, especially when deployed in container environments.

Solution

We should provide mechanisms to limit memory and avoid OOM.

In the connection dimension:

Are you willing to submit a PR?

AntiTopQuark commented 3 months ago

I am interested in this problem. Could you assign the issue to me?

caipengbo commented 3 months ago

I am interested in this problem. Could you assign the issue to me?

@AntiTopQuark Sure, go for it!

PokIsemaine commented 2 months ago

Hello @AntiTopQuark, I am also interested in this issue. When I implemented the Sort command before, there was a need for memory restrictions.

/// SORT_LENGTH_LIMIT limits the number of elements to be sorted
/// to avoid using too much memory and causing system crashes.
/// TODO: Expect to expand or eliminate SORT_LENGTH_LIMIT
/// through better mechanisms such as memory restriction logic.
constexpr uint64_t SORT_LENGTH_LIMIT = 512;

It seems that the current memory statistics of kvrocks are still coarse-grained process-level statistics. Do we need more fine-grained tracking statistics? I've collected some references: https://cwiki.apache.org/confluence/display/DORIS/DSIP-002%3A+Refactor+memory+tracker+on+BE https://doris.apache.org/blog/Say-Goodbye-to-OOM-Crashes/ https://www.modb.pro/db/1798912145290776576

They seem to utilize the memory allocator to achieve this, would you like to share and discuss your thoughts?

AntiTopQuark commented 2 months ago

Hello @AntiTopQuark, I am also interested in this issue. When I implemented the Sort command before, there was a need for memory restrictions.

/// SORT_LENGTH_LIMIT limits the number of elements to be sorted
/// to avoid using too much memory and causing system crashes.
/// TODO: Expect to expand or eliminate SORT_LENGTH_LIMIT
/// through better mechanisms such as memory restriction logic.
constexpr uint64_t SORT_LENGTH_LIMIT = 512;

It seems that the current memory statistics of kvrocks are still coarse-grained process-level statistics. Do we need more fine-grained tracking statistics? I've collected some references: https://cwiki.apache.org/confluence/display/DORIS/DSIP-002%3A+Refactor+memory+tracker+on+BE https://doris.apache.org/blog/Say-Goodbye-to-OOM-Crashes/ https://www.modb.pro/db/1798912145290776576

They seem to utilize the memory allocator to achieve this, would you like to share and discuss your thoughts?

I briefly read through the three articles and found that most methods are quite similar. They all involve tracking the type, size, and frequency of allocations, as well as the stack at the time of allocation, to avoid OOM errors and to diagnose memory leaks. I am more familiar with the OceanBase database, which mainly implements a series of ObAllocator's alloc functions. These functions track the size and address of allocations (with address tracking enabled via a switch), and compare them against tenant-level memory limits to prevent OOM. If memory allocation fails, an error code is returned to the client. You can find more information at the following links:

OceanBase Official Documentation OceanBase GitHub Code

For the issue with kvrocks, I think it’s best to keep it simple. Implementing overrides for malloc and new to record the number of allocations and the amount of memory should suffice. Additionally, you can limit memory allocation using the max-memory configuration item and set a limit on the maximum number of connections.

caipengbo commented 2 months ago

I prefer a simpler approach, count the size of the output buffer each time you put something into it. Of course, it's better to use a memory allocator.