Closed maobaolong closed 4 days ago
2 956 files ±0 2 956 suites ±0 6h 11m 47s ⏱️ + 1m 25s 1 092 tests ±0 1 090 ✅ ±0 2 💤 ±0 0 ❌ ±0 13 685 runs ±0 13 655 ✅ ±0 30 💤 ±0 0 ❌ ±0
Results for commit 51e09249. ± Comparison against base commit 4eebbd46.
:recycle: This comment has been updated with latest results.
Should it become an applicationl evel config option?
Should it become an application level config option?
@jerqi Sorry for the late reply.
It could be better if it become an app level request option. But It could be a little complex as follow
Could you explain the benefit?
For some little shuflfe, PartionedShuffleBlock
may have worse effect. Like mixed load, there are many little shuffle.
Should it become an application level config option?
@jerqi Sorry for the late reply.
It could be better if it become an app level request option. But It could be a little complex as follow
- Config this option in client conf. (Just list to here, this is easy to do)
- Make ShuffleBlockIdManager as a member of ShuffleTaskInfo which created follow the config from registerShuffleRequest
Could you explain the benefit?
We can have a ShuffleBlockIdManager, but ShuffleTaskInfo has its BlockId strategy.
@jerqi Thanks for your suggestion, how about this commit? https://github.com/apache/incubator-uniffle/pull/2227/commits/74373805c2da970e2bedc7da0d15be72718b9c96
@jerqi Update the name of config key, PTAL
In code, there is the term "manager", while in (user-facing) config there is the term "strategy". I think there should be one term. I'd prefer "manager".
@EnricoMi Thanks for your suggestion, in fact this is my original term
I used for this PR, I'm open to change to strategy
or revert to manager
.
In code, there is the term "manager", while in (user-facing) config there is the term "strategy". I think there should be one term. I'd prefer "manager".
It's better to have one term. My concern about manager
is that every TaskInfo
will have a manager. In my view, we would better not have the same managers in a server. WDYT?
I don't see a problem having different types of manager in the server. The name ShuffleTaskManager
clearly says this manager is about shuffle tasks, whereas the ShuffleBlockIdManager
manages shuffle block ids.
I'd prefer "Manager" over "Strategy".
I don't see a problem having different types of manager in the server. The name
ShuffleTaskManager
clearly says this manager is about shuffle tasks, whereas theShuffleBlockIdManager
manages shuffle block ids.I'd prefer "Manager" over "Strategy".
cc @maobaolong I am ok if both of you prefer manager
.
@jerqi @EnricoMi Thanks for your discussion and suggestion, I updated the PR, PTAL.
@jerqi Thanks for your comment, updated the description and docs, PTAL.
@jerqi Thanks for your review. merge it.
What changes were proposed in this pull request?
Why are the changes needed?
Fix: #2219
This is the maximum partition num with node metrics in our cluster, it is ~410K, it cost heap size 410K * 50KiB = 2GiB if we say a bitmap related a partition avg 10000 blocks(cost 50KiB heap).
I think this is worth to choose this new policy in our production cluster.
Does this PR introduce any user-facing change?
How was this patch tested?
Test Locally, check result by
arthas
[arthas@264]$ vmtool --action getInstances --className org.apache.uniffle.server.ShuffleTaskInfo --express 'instances[0].shuffleBlockIdManager.partitionsToBlockIds.get("application_1729845342052_431031_1730297736664") ' @ConcurrentHashMapForJDK8[ @Integer[0]:@Roaring64NavigableMap[][isEmpty=false;size=10], @Integer[1]:@Roaring64NavigableMap[][isEmpty=false;size=10], @Integer[2]:@Roaring64NavigableMap[][isEmpty=false;size=10], ]
[arthas@264]$ vmtool --action getInstances --className org.apache.uniffle.server.ShuffleTaskInfo --express 'instances[0].shuffleBlockIdManager' @PartitionedShuffleBlockIdManager[ LOG=@Log4jLogger[org.apache.logging.slf4j.Log4jLogger@6aa6fea0], partitionsToBlockIds=@ConcurrentHashMap[isEmpty=false;size=1], ]
[arthas@264]$ vmtool --action getInstances --className org.apache.uniffle.server.ShuffleTaskInfo --express 'instances[0].shuffleBlockIdManager.partitionsToBlockIds' @ConcurrentHashMap[ @String[application_1729845342052_432324_1730299397916]:@ConcurrentHashMapForJDK8[isEmpty=false;size=2], ] [arthas@264]$ vmtool --action getInstances --className org.apache.uniffle.server.ShuffleTaskInfo --express 'instances[0].shuffleBlockIdManager.partitionsToBlockIds.get("application_1729845342052_432324_1730299397916")' @ConcurrentHashMapForJDK8[ @Integer[0]:@ConcurrentHashMap[isEmpty=false;size=2000], @Integer[1]:@ConcurrentHashMap[isEmpty=false;size=2000], ]
[arthas@264]$ vmtool --action getInstances --className org.apache.uniffle.server.ShuffleTaskInfo --express 'instances[0].shuffleBlockIdManager' @DefaultShuffleBlockIdManager[ LOG=@Log4jLogger[org.apache.logging.slf4j.Log4jLogger@ed3b52a], partitionsToBlockIds=@ConcurrentHashMapForJDK8[isEmpty=false;size=1], ] [arthas@264]$ vmtool --action getInstances --className org.apache.uniffle.server.ShuffleTaskInfo --express 'instances[1].shuffleBlockIdManager' @PartitionedShuffleBlockIdManager[ LOG=@Log4jLogger[org.apache.logging.slf4j.Log4jLogger@5fc14628], partitionsToBlockIds=@ConcurrentHashMap[isEmpty=false;size=1], ]
[2024-10-30 23:01:37.585] [Grpc-709] [INFO] ShuffleTaskInfo - application_1729845342052_432387_1730300440568 use app configured ShuffleBlockIdManager to org.apache.uniffle.server.block.DefaultShuffleBlockIdManager@3c6b75a8