feat: Add a `spark.comet.exec.memoryPool` configuration for experimenting with various datafusion memory pool setups.

Kontinuation commented 1 month ago

Which issue does this PR close?

This issue relates to https://github.com/apache/datafusion-comet/issues/996 and https://github.com/apache/datafusion-comet/pull/1004

Rationale for this change

This is for investigating various approaches to simplify memory-related configuration and reduce the memory required to run large queries. @andygrove

What changes are included in this PR?

This PR adds a spark.comet.exec.memoryPool configuration for easily running queries using various memory pool setups.

greedy: Each operator has its own GreedyMemoryPool, which is the same as the current situation.
fair_spill: Each operator has its own FairSpillPool
greedy_task_shared (default): All operators for the same task attempt share the same GreedyMemoryPool.
fair_spill_task_shared: All operators for the same task attempt share the same FairSpillPool
greedy_global: All operators in the same executor instance share the same GreedyMemoryPool
fair_spill_global: All operators in the same executor instance share the same FairSpillPool

How are these changes tested?

TODO: add tests running in native memory management mode.

codecov-commenter commented 1 month ago

Codecov Report

Attention: Patch coverage is 77.27273% with 5 lines in your changes missing coverage. Please review.

Project coverage is 34.27%. Comparing base (591f45a) to head (bd5a0c7). Report is 1 commits behind head on main.

Files with missing lines	Patch %	Lines
...ain/scala/org/apache/comet/CometExecIterator.scala	73.68%	2 Missing and 3 partials :warning:

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #1021 +/- ## ============================================ - Coverage 34.30% 34.27% -0.03% - Complexity 887 889 +2 ============================================ Files 112 112 Lines 43429 43502 +73 Branches 9623 9615 -8 ============================================ + Hits 14897 14912 +15 - Misses 25473 25542 +69 + Partials 3059 3048 -11 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

andygrove commented 1 month ago

Thanks for filing this @Kontinuation. I will close my PRs.

Also, there is a suggestion in https://github.com/apache/datafusion-comet/issues/1017 for always using unified memory.

Kontinuation commented 1 month ago

Thanks for filing this @Kontinuation. I will close my PRs.

Also, there is a suggestion in #1017 for always using unified memory.

I agree that using the unified memory manager is a better approach. Vanilla Spark operators and comet operators are governed by the same memory manager and they are all using offheap memory. Vanilla Spark operators can free some memory when comet operators are under pressure. I'll also put more work into improving the unified memory management.

I think the native memory management approach may still be relevant when users don't want vanilla Spark to use off-heap memory. We can set the default value of spark.comet.exec.memoryPool as greedy_task_shared to make comet memory overhead configuration more intuitive. If users want memory-intensive operators to spill properly, they can try out fair_spill_task_shared.

apache / datafusion-comet