Capping VectorPool by memory usage

kagamiori commented 1 year ago

Description

VectorPool is used during expression evaluation to cache unneeded vectors to be reused later. Today, the size of VectorPool is capped by the number of vectors it caches per type (i.e., 10), the number of types (i.e., only primitive types), and the size of vectors (i.e., less than 1024 * 64). VectorPool is possessed by ExecCtx in each Operator and is destructed when Operator is closed.

However, it has been observed that in a streaming engine, operators are never destructed, so are the VectorPools of them. An experiment shows that caching 10 Varchar vectors of size 1024 * 64 of 1000-character strings can take 16MB. When a streaming query contains many operators, the sizes of their VectorPools can easily add up to a GB which can be a problem.

To address this problem, we can make VectorPool capped by the memory usage instead of the number of vectors and allow library users to configure the memory cap of VectorPool per operator.

Experiment:

TEST_F(VectorPoolTest, memoryCap) {
  VectorPool vectorPool(pool());
  VELOX_CHECK(
      pool()->currentBytes() == 0, "Initial memory pool should be empty.");

  VectorFuzzer::Options options{.vectorSize = 1000, .stringLength = 4};
  VectorFuzzer fuzzer(options, pool());

  const int kSize = 10;
  for (auto i = 0; i < kSize; ++i) {
    auto vector = fuzzer.fuzzFlat(TINYINT());
    vectorPool.release(vector);
  }
  LOG(INFO) << fmt::format(
      "Memory used after caching {} tinyint vectors of size 1000: {}",
      kSize,
      pool()->currentBytes());

  vectorPool.clear();
  VELOX_CHECK(pool()->currentBytes() == 0, "Memory pool should be empty.");
  for (auto i = 0; i < kSize; ++i) {
    auto vector = fuzzer.fuzzFlat(BIGINT());
    vectorPool.release(vector);
  }
  LOG(INFO) << fmt::format(
      "Memory used after caching {} bigint vectors of size 1000: {}",
      kSize,
      pool()->currentBytes());

  vectorPool.clear();
  VELOX_CHECK(pool()->currentBytes() == 0, "Memory pool should be empty.");
  for (auto i = 0; i < kSize; ++i) {
    auto vector = fuzzer.fuzzFlat(VARCHAR());
    vectorPool.release(vector);
  }
  LOG(INFO) << fmt::format(
      "Memory used after caching {} varchar vectors of 4-character strings of size 1000: {}",
      kSize,
      pool()->currentBytes());

  vectorPool.clear();
  VELOX_CHECK(pool()->currentBytes() == 0, "Memory pool should be empty.");
  options.stringLength = 100;
  fuzzer.setOptions(options);
  for (auto i = 0; i < kSize; ++i) {
    auto vector = fuzzer.fuzzFlat(VARCHAR());
    vectorPool.release(vector);
  }
  LOG(INFO) << fmt::format(
      "Memory used after caching {} varchar vectors of 100-character strings of size 1000: {}",
      kSize,
      pool()->currentBytes());

  vectorPool.clear();
  VELOX_CHECK(pool()->currentBytes() == 0, "Memory pool should be empty.");
  options.stringLength = 1000;
  fuzzer.setOptions(options);
  for (auto i = 0; i < kSize; ++i) {
    auto vector = fuzzer.fuzzFlat(VARCHAR(), 64 * 1024);
    vectorPool.release(vector);
  }
  LOG(INFO) << fmt::format(
      "Memory used after caching {} varchar vectors of 1000-character strings of size 64 * 1024: {}",
      kSize,
      pool()->currentBytes());
}

Experiment result:

I0918 21:10:50.121701 1863508 VectorPoolTest.cpp:164] Memory used after caching 10 tinyint vectors of size 1000: 15360
I0918 21:10:52.774042 1863508 VectorPoolTest.cpp:175] Memory used after caching 10 bigint vectors of size 1000: 81920
I0918 21:10:56.839509 1863508 VectorPoolTest.cpp:186] Memory used after caching 10 varchar vectors of 4-character strings of size 1000: 163840
I0918 21:11:00.273428 1863508 VectorPoolTest.cpp:199] Memory used after caching 10 varchar vectors of 100-character strings of size 1000: 655360
I0918 21:12:40.775014 1863508 VectorPoolTest.cpp:212] Memory used after caching 10 varchar vectors of 1000-character strings of size 64 * 1024: 16220160

mbasmanova commented 1 year ago

A more general solution could be to introduce a cap on amount of memory an operator can use for caching (VectorPool, and similar). The operator can then decide how to use that budget.

More practical solution might be to introduce a cap per Task, not per operator though.

mbasmanova commented 1 year ago

CC: @xiaoxmeng

kagamiori commented 1 year ago

@xiaoxmeng and I discussed a few possibilities.

Having a memory cap per task for all of its memory usage including VectorPool.
- cons: Non-trivial to implement since it requires arbitration logic. Also, since there is a VectorPool per operator, the logic of arbitration among them can be expensive and diminish the saving from VectorPool.
Having a fixed memory cap per VectorPool. The cap can be determined from the task-level cap and number of operators in the task.
- cons: We cannot get the memory size used by the VectorPool by simply looking at the operator's MemoryPool. The reason is that there may be other things that use the MemoryPool as well and vectors in the operator's VectorPool may not be allocated from this operator's pool. So we need to keep a currentByte in the VectorPool and update it with vector->estimateFlatSize() whenever a vector is added or removed. But estimateFlatSize() itself has an overhead that can diminish the saving from VectorPool.
Disable VectorPool in the streaming engine.
- VectorPool is used during expression evaluation to reduce runtime memory allocation that is slow. We need performance measurement at the engine side to see whether disabling VectorPool affect the performance.

facebookincubator / velox

Capping VectorPool by memory usage #6635

Description