Open Kontinuation opened 1 month ago
Attention: Patch coverage is 77.27273%
with 5 lines
in your changes missing coverage. Please review.
Project coverage is 34.27%. Comparing base (
591f45a
) to head (bd5a0c7
). Report is 1 commits behind head on main.
Files with missing lines | Patch % | Lines |
---|---|---|
...ain/scala/org/apache/comet/CometExecIterator.scala | 73.68% | 2 Missing and 3 partials :warning: |
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Thanks for filing this @Kontinuation. I will close my PRs.
Also, there is a suggestion in https://github.com/apache/datafusion-comet/issues/1017 for always using unified memory.
Thanks for filing this @Kontinuation. I will close my PRs.
Also, there is a suggestion in #1017 for always using unified memory.
I agree that using the unified memory manager is a better approach. Vanilla Spark operators and comet operators are governed by the same memory manager and they are all using offheap memory. Vanilla Spark operators can free some memory when comet operators are under pressure. I'll also put more work into improving the unified memory management.
I think the native memory management approach may still be relevant when users don't want vanilla Spark to use off-heap memory. We can set the default value of spark.comet.exec.memoryPool
as greedy_task_shared
to make comet memory overhead configuration more intuitive. If users want memory-intensive operators to spill properly, they can try out fair_spill_task_shared
.
Which issue does this PR close?
This issue relates to https://github.com/apache/datafusion-comet/issues/996 and https://github.com/apache/datafusion-comet/pull/1004
Rationale for this change
This is for investigating various approaches to simplify memory-related configuration and reduce the memory required to run large queries. @andygrove
What changes are included in this PR?
This PR adds a
spark.comet.exec.memoryPool
configuration for easily running queries using various memory pool setups.greedy
: Each operator has its own GreedyMemoryPool, which is the same as the current situation.fair_spill
: Each operator has its own FairSpillPoolgreedy_task_shared
(default): All operators for the same task attempt share the same GreedyMemoryPool.fair_spill_task_shared
: All operators for the same task attempt share the same FairSpillPoolgreedy_global
: All operators in the same executor instance share the same GreedyMemoryPoolfair_spill_global
: All operators in the same executor instance share the same FairSpillPoolHow are these changes tested?
TODO: add tests running in native memory management mode.