apache / incubator-uniffle

Uniffle is a high performance, general purpose Remote Shuffle Service.
https://uniffle.apache.org/
Apache License 2.0
381 stars 148 forks source link

[Improvement] Introduce local allocation buffer to store blocks in memory #1727

Open xianjingfeng opened 5 months ago

xianjingfeng commented 5 months ago

Code of Conduct

Search before asking

What would you like to be improved?

Currently we have put the shuffle data into the off-heap memory in shuffle server . But I found it still occupancy a lot of heap memory. The following is the result of printing by using jmap -histo.

   1:     189601376    16684921088  io.netty.buffer.UnpooledByteBufAllocator$InstrumentedUnpooledUnsafeDirectByteBuf
   2:     189860728    15188858240  java.nio.DirectByteBuffer (java.base@11.0.1)
   3:     189605871    13651622712  jdk.internal.ref.Cleaner (java.base@11.0.1)
   4:     189018520    10585037120  org.apache.uniffle.common.ShufflePartitionedBlock
   5:     189605871     7584234840  java.nio.DirectByteBuffer$Deallocator (java.base@11.0.1)

From the above results, we can see that the main reason for high memory usage is that there are too many blocks. And the reason why there are so many blocks is because the blocks are very small.

How should we improve?

Introduce local allocation buffer like MSLAB in Hbase. Refer: https://hbase.apache.org/book.html#gcpause

Are you willing to submit PR?

xianjingfeng commented 5 months ago

@jerqi @zuston @advancedxy @rickyma PTAL. I'm quite busy recently. If anyone interested in it, welcome to pick it up.

rickyma commented 5 months ago

This issue seems feasible. I'll take a look first. We need this too.

Currently, there are a few things that we can do to make blocks smaller:

  1. Set spark.rss.writer.buffer.spill.size to a higher value to make blocks larger, e.g. 1g or 2g.
  2. Set rss.client.memory.spill.ratio less than 0.5, e.g. 0.3, let larger blocks spill first.
  3. Set spark.rss.writer.buffer.size to a larger value refer to https://github.com/apache/incubator-uniffle/issues/1594#issuecomment-2081378887, e.g. 10m.