apache / datafusion-comet

Apache DataFusion Comet Spark Accelerator
https://datafusion.apache.org/comet
Apache License 2.0
823 stars 163 forks source link

feat: Implement shared memory pool for case where spark.memory.offHeap.enabled=false #1002

Closed andygrove closed 1 month ago

andygrove commented 1 month ago

Which issue does this PR close?

Closes https://github.com/apache/datafusion-comet/issues/996

Rationale for this change

Simplify memory configuration.

What changes are included in this PR?

Allocate one shared pool per executor, rather than one pool per native plan, when spark.memory.offHeap.enabled=false.

How are these changes tested?

Kontinuation commented 1 month ago

I'm a bit worried about this approach because we are implementing greedy mode inside CometTaskMemoryManager, which is known to starve consumers frequently. I prefer using fair spill pool for "native memory management" mode. This makes spillable operators work properly without being starved but with the cost of memory pool under-utilization.

andygrove commented 1 month ago

I'm a bit worried about this approach because we are implementing greedy mode inside CometTaskMemoryManager, which is known to starve consumers frequently. I prefer using fair spill pool for "native memory management" mode. This makes spillable operators work properly without being starved but with the cost of memory pool under-utilization.

Thanks for the feedback. I will work on a separate PR for the fair spill approach. I am moving this PR to draft for now.

andygrove commented 1 month ago

Closing in favor of https://github.com/apache/datafusion-comet/pull/1021