Open FelixYBW opened 2 months ago
It's triggered by hashbuild, but hashbuild holds all memory
Another one:
org.apache.gluten.exception.GlutenException: java.lang.RuntimeException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Operator::addInput failed for [operator: Aggregation, plan node ID: 5]: Error during calling Java code from native code: org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget$OutOfMemoryException: Not enough spark off-heap execution memory. Acquired: 8.0 MiB, granted: 0.0 B. Try tweaking config option spark.memory.offHeap.size to get larger space to run this application (if spark.gluten.memory.dynamic.offHeap.sizing.enabled is not enabled).
Current config settings:
spark.gluten.memory.offHeap.size.in.bytes=40.0 GiB
spark.gluten.memory.task.offHeap.size.in.bytes=10.0 GiB
spark.gluten.memory.conservative.task.offHeap.size.in.bytes=5.0 GiB
spark.memory.offHeap.enabled=true
spark.gluten.memory.dynamic.offHeap.sizing.enabled=false
Memory consumer stats:
Task.932: Current used bytes: 10.0 GiB, peak bytes: N/A
\- Gluten.Tree.0: Current used bytes: 10.0 GiB, peak bytes: 10.0 GiB
\- root.0: Current used bytes: 10.0 GiB, peak bytes: 10.0 GiB
+- WholeStageIterator.0: Current used bytes: 10.0 GiB, peak bytes: 10.0 GiB
| \- single: Current used bytes: 10.0 GiB, peak bytes: 10.0 GiB
| +- WholeStageIterator_root: Current used bytes: 277.8 MiB, peak bytes: 9.9 GiB
| | +- task.Gluten_Stage_0_TID_932: Current used bytes: 277.8 MiB, peak bytes: 9.9 GiB
| | | +- node.5: Current used bytes: 256.0 MiB, peak bytes: 9.9 GiB
| | | | \- op.5.0.0.Aggregation: Current used bytes: 256.0 MiB, peak bytes: 6.6 GiB
| | | +- node.0: Current used bytes: 20.9 MiB, peak bytes: 24.0 MiB
| | | | +- op.0.0.0.TableScan: Current used bytes: 20.9 MiB, peak bytes: 22.6 MiB
| | | | \- op.0.0.0.TableScan.test-hive: Current used bytes: 0.0 B, peak bytes: 0.0 B
| | | +- node.1: Current used bytes: 890.6 KiB, peak bytes: 2.0 MiB
| | | | \- op.1.0.0.FilterProject: Current used bytes: 890.6 KiB, peak bytes: 1087.6 KiB
| | | +- node.3: Current used bytes: 25.3 KiB, peak bytes: 1024.0 KiB
| | | | \- op.3.0.0.FilterProject: Current used bytes: 25.3 KiB, peak bytes: 26.1 KiB
| | | +- node.2: Current used bytes: 640.0 B, peak bytes: 1024.0 KiB
| | | | \- op.2.0.0.Expand: Current used bytes: 640.0 B, peak bytes: 640.0 B
| | | +- node.6: Current used bytes: 0.0 B, peak bytes: 0.0 B
| | | | \- op.6.0.0.FilterProject: Current used bytes: 0.0 B, peak bytes: 0.0 B
| | | \- node.4: Current used bytes: 0.0 B, peak bytes: 0.0 B
| | | \- op.4.0.0.Expand: Current used bytes: 0.0 B, peak bytes: 0.0 B
| | \- WholeStageIterator_default_leaf: Current used bytes: 1536.0 B, peak bytes: 1664.0 B
| \- gluten::MemoryAllocator: Current used bytes: 0.0 B, peak bytes: 0.0 B
\- OverAcquire.DummyTarget.1: Current used bytes: 0.0 B, peak bytes: 2.2 GiB
looks it's caused by spill not triggered
query plan:
+- ColumnarExchange (10)
+- ^ ProjectExecTransformer (8)
+- ^ RegularHashAggregateExecTransformer (7)
+- ^ ExpandExecTransformer (6)
+- ^ ProjectExecTransformer (5)
+- ^ ExpandExecTransformer (4)
+- ^ ProjectExecTransformer (3)
+- ^ FilterExecTransformer (2)
+- ^ Scan parquet distribution.video_feedview_logs (1)
Usually Velox's hash join stops spilling when it reaches the max spill level. But I am not sure if it's the case. Going through the full log may help diagnose. @FelixYBW
issue is gone after rebase
The issue is still there:
Aggregation didn't hold memory but its root pool still hold the memory.
24/09/19 08:59:37 ERROR [Executor task launch worker for task 270.0 in stage 0.0 (TID 270)] listener.ManagedReservationListener: Error reserving memory from target
org.apache.gluten.memory.memtarget.ThrowOnOomMemoryTarget$OutOfMemoryException: Not enough spark off-heap execution memory. Acquired: 1080.0 MiB, granted: 552.0 MiB. Try tweaking config option spark.memory.offHeap.size to get larger space to run this application (if spark.gluten.memory.dynamic.offHeap.sizing.enabled is not enabled).
Current config settings:
spark.gluten.memory.offHeap.size.in.bytes=50.0 GiB
spark.gluten.memory.task.offHeap.size.in.bytes=12.5 GiB
spark.gluten.memory.conservative.task.offHeap.size.in.bytes=6.3 GiB
spark.memory.offHeap.enabled=true
spark.gluten.memory.dynamic.offHeap.sizing.enabled=false
Memory consumer stats:
Task.270: Current used bytes: 12.0 GiB, peak bytes: N/A
\- Gluten.Tree.3: Current used bytes: 12.0 GiB, peak bytes: 12.5 GiB
\- root.3: Current used bytes: 12.0 GiB, peak bytes: 12.5 GiB
+- WholeStageIterator.0: Current used bytes: 12.0 GiB, peak bytes: 12.5 GiB
| \- single: Current used bytes: 12.0 GiB, peak bytes: 12.0 GiB
| +- root: Current used bytes: 281.2 MiB, peak bytes: 12.0 GiB
| | +- task.Gluten_Stage_0_TID_270_VTID_0: Current used bytes: 281.2 MiB, peak bytes: 12.0 GiB
| | | +- node.5: Current used bytes: 256.0 MiB, peak bytes: 11.9 GiB
| | | | \- op.5.0.0.Aggregation: Current used bytes: 256.0 MiB, peak bytes: 7.9 GiB
| | | +- node.0: Current used bytes: 24.2 MiB, peak bytes: 28.0 MiB
| | | | +- op.0.0.0.TableScan: Current used bytes: 24.2 MiB, peak bytes: 26.3 MiB
| | | | \- op.0.0.0.TableScan.test-hive: Current used bytes: 0.0 B, peak bytes: 0.0 B
| | | +- node.1: Current used bytes: 986.5 KiB, peak bytes: 2.0 MiB
| | | | \- op.1.0.0.FilterProject: Current used bytes: 986.5 KiB, peak bytes: 1183.5 KiB
| | | +- node.3: Current used bytes: 25.3 KiB, peak bytes: 1024.0 KiB
| | | | \- op.3.0.0.FilterProject: Current used bytes: 25.3 KiB, peak bytes: 26.1 KiB
| | | +- node.2: Current used bytes: 512.0 B, peak bytes: 1024.0 KiB
| | | | \- op.2.0.0.Expand: Current used bytes: 512.0 B, peak bytes: 640.0 B
| | | +- node.4: Current used bytes: 0.0 B, peak bytes: 0.0 B
| | | | \- op.4.0.0.Expand: Current used bytes: 0.0 B, peak bytes: 0.0 B
| | | \- node.6: Current used bytes: 0.0 B, peak bytes: 0.0 B
| | | \- op.6.0.0.FilterProject: Current used bytes: 0.0 B, peak bytes: 0.0 B
| | \- default_leaf: Current used bytes: 1536.0 B, peak bytes: 1664.0 B
| \- gluten::MemoryAllocator: Current used bytes: 0.0 B, peak bytes: 0.0 B
+- ShuffleWriter.3.OverAcquire.0: Current used bytes: 0.0 B, peak bytes: 0.0 B
+- ShuffleWriter.3: Current used bytes: 0.0 B, peak bytes: 0.0 B
| \- single: Current used bytes: 0.0 B, peak bytes: 0.0 B
| +- root: Current used bytes: 0.0 B, peak bytes: 0.0 B
| | \- default_leaf: Current used bytes: 0.0 B, peak bytes: 0.0 B
| \- gluten::MemoryAllocator: Current used bytes: 0.0 B, peak bytes: 0.0 B
\- WholeStageIterator.0.OverAcquire.0: Current used bytes: 0.0 B, peak bytes: 3.0 GiB
Backend
VL (Velox)
Bug description
@zhztheplayer
Spark version
Spark-3.2.x
Spark configurations
No response
System information
No response
Relevant logs
No response