apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.21k stars 435 forks source link

[VL] Peak memory is high than Velox's stats #5630

Closed Yohahaha closed 6 months ago

Yohahaha commented 6 months ago

Backend

VL (Velox)

Bug description

TPCDS sf100 q67

Spark view of highest peak memory is 1.1G

+----+-------+-----------------+------------+
|Name|Runtime|TaskPeakMem      |StagePeakMem|
+----+-------+-----------------+------------+
|q67 |30.93  |1160.0 MiB (1028)|19.0 GiB (7)|
|sum |30.93  |                 |            |
+----+-------+-----------------+------------+

Velox view of highest peak memory is 500+M

I20240507 10:54:36.669220 1771770 WholeStageResultIterator.cc:437] Native Plan with stats for: [Stage: 7 TID: 1028]
-- 9 Project[expressions: (n9_11:INTEGER, hash_with_seed(42,"n8_10","n8_11","n8_12","n8_13","n8_14","n8_15","n8_16","n8_17","n8_18")), (n9_12:VARCHAR, "n8_10"), (n9_13:VARCHAR, "n8_11"), (n9_14:VARCHAR, "n8_12"), (n9_15:VARCHAR, "n8_13"), (n9_16:INTEGER, "n8_14"), (n9_17:INTEGER, "n8_15"), (n9_18:INTEGER, "n8_16"), (n9_19:VARCHAR, "n8_17"), (n9_20:BIGINT, "n8_18"), (n9_21:DECIMAL(28, 2), "n8_19"), (n9_22:BOOLEAN, "n8_20")] -> n9_11:INTEGER, n9_12:VARCHAR, n9_13:VARCHAR, n9_14:VARCHAR, n9_15:VARCHAR, n9_16:INTEGER, n9_17:INTEGER, n9_18:INTEGER, n9_19:VARCHAR, n9_20:BIGINT, n9_21:DECIMAL(28, 2), n9_22:BOOLEAN
   Output: 3279771 rows (827.15MB, 802 batches), Cpu time: 489.84ms, Blocked wall time: 0ns, Peak memory: 24.00KB, Memory allocations: 802, Threads: 1
      queuedWallNanos              sum: 466.00us, count: 1, min: 466.00us, max: 466.00us
      runningAddInputWallNanos     sum: 197.77us, count: 1, min: 197.77us, max: 197.77us
      runningFinishWallNanos       sum: 0ns, count: 1, min: 0ns, max: 0ns
      runningGetOutputWallNanos    sum: 491.61ms, count: 1, min: 491.61ms, max: 491.61ms
  -- 8 Project[expressions: (n8_10:VARCHAR, "n6_13"), (n8_11:VARCHAR, "n6_14"), (n8_12:VARCHAR, "n6_15"), (n8_13:VARCHAR, "n6_16"), (n8_14:INTEGER, "n6_17"), (n8_15:INTEGER, "n6_18"), (n8_16:INTEGER, "n6_19"), (n8_17:VARCHAR, "n6_20"), (n8_18:BIGINT, "n6_21"), (n8_19:DECIMAL(28, 2), "n7_9"["col_0"]), (n8_20:BOOLEAN, "n7_9"["col_1"])] -> n8_10:VARCHAR, n8_11:VARCHAR, n8_12:VARCHAR, n8_13:VARCHAR, n8_14:INTEGER, n8_15:INTEGER, n8_16:INTEGER, n8_17:VARCHAR, n8_18:BIGINT, n8_19:DECIMAL(28, 2), n8_20:BOOLEAN
     Output: 3279771 rows (808.46MB, 802 batches), Cpu time: 7.61ms, Blocked wall time: 0ns, Peak memory: 0B, Memory allocations: 0, Threads: 1
        runningAddInputWallNanos     sum: 537.63us, count: 1, min: 537.63us, max: 537.63us
        runningFinishWallNanos       sum: 142ns, count: 1, min: 142ns, max: 142ns
        runningGetOutputWallNanos    sum: 6.51ms, count: 1, min: 6.51ms, max: 6.51ms
    -- 7 Aggregation[PARTIAL [n6_13, n6_14, n6_15, n6_16, n6_17, n6_18, n6_19, n6_20, n6_21] n7_9 := sum_partial("n6_22")] -> n6_13:VARCHAR, n6_14:VARCHAR, n6_15:VARCHAR, n6_16:VARCHAR, n6_17:INTEGER, n6_18:INTEGER, n6_19:INTEGER, n6_20:VARCHAR, n6_21:BIGINT, n7_9:ROW<col_0:DECIMAL(28, 2),col_1:BOOLEAN>
       Output: 3279771 rows (808.46MB, 802 batches), Cpu time: 7.17s, Blocked wall time: 0ns, Peak memory: 503.52MB, Memory allocations: 6608, Threads: 1
          distinctKey0                                sum: 11, count: 1, min: 11, max: 11
          distinctKey1                                sum: 100, count: 1, min: 100, max: 100
          distinctKey2                                sum: 687, count: 1, min: 687, max: 687
          distinctKey3                                sum: 8065, count: 1, min: 8065, max: 8065
          distinctKey4                                sum: 2, count: 1, min: 2, max: 2
          distinctKey5                                sum: 5, count: 1, min: 5, max: 5
          distinctKey6                                sum: 6, count: 1, min: 6, max: 6
          distinctKey7                                sum: 202, count: 1, min: 202, max: 202
          distinctKey8                                sum: 10, count: 1, min: 10, max: 10
          flushRowCount                               sum: 1377709, count: 1, min: 1377709, max: 1377709
          flushTimes                                  sum: 1, count: 1, min: 1, max: 1
          hashtable.capacity                          sum: 4194304, count: 1, min: 4194304, max: 4194304
          hashtable.numDistinct                       sum: 1902062, count: 1, min: 1902062, max: 1902062
          hashtable.numRehashes                       sum: 21, count: 1, min: 21, max: 21
          hashtable.numTombstones                     sum: 0, count: 1, min: 0, max: 0
          maxExtendedPartialAggregationMemoryUsage    sum: 537.60MB, count: 1, min: 537.60MB, max: 537.60MB
          partialAggregationPct                       sum: 11, count: 1, min: 11, max: 11
          rangeKey4                                   sum: 510, count: 1, min: 510, max: 510
          rangeKey5                                   sum: 767, count: 1, min: 767, max: 767
          rangeKey6                                   sum: 1790, count: 1, min: 1790, max: 1790
          rangeKey8                                   sum: 3775, count: 1, min: 3775, max: 3775
          runningAddInputWallNanos                    sum: 6.46s, count: 1, min: 6.46s, max: 6.46s
          runningFinishWallNanos                      sum: 851ns, count: 1, min: 851ns, max: 851ns
          runningGetOutputWallNanos                   sum: 737.58ms, count: 1, min: 737.58ms, max: 737.58ms
      -- 6 Project[expressions: (n6_11:INTEGER, "n5_0"), (n6_12:DECIMAL(7, 2), "n5_1"), (n6_13:VARCHAR, "n5_2"), (n6_14:VARCHAR, "n5_3"), (n6_15:VARCHAR, "n5_4"), (n6_16:VARCHAR, "n5_5"), (n6_17:INTEGER, "n5_6"), (n6_18:INTEGER, "n5_7"), (n6_19:INTEGER, "n5_8"), (n6_20:VARCHAR, "n5_9"), (n6_21:BIGINT, "n5_10"), (n6_22:DECIMAL(18, 2), coalesce(multiply("n5_1",try_cast "n5_0" as DECIMAL(10, 0)),0.00))] -> n6_11:INTEGER, n6_12:DECIMAL(7, 2), n6_13:VARCHAR, n6_14:VARCHAR, n6_15:VARCHAR, n6_16:VARCHAR, n6_17:INTEGER, n6_18:INTEGER, n6_19:INTEGER, n6_20:VARCHAR, n6_21:BIGINT, n6_22:DECIMAL(18, 2)
         Output: 29551140 rows (4.20GB, 8190 batches), Cpu time: 1.45s, Blocked wall time: 0ns, Peak memory: 112.75KB, Memory allocations: 40941, Threads: 1
            runningAddInputWallNanos     sum: 2.38ms, count: 1, min: 2.38ms, max: 2.38ms
            runningFinishWallNanos       sum: 3.31us, count: 1, min: 3.31us, max: 3.31us
            runningGetOutputWallNanos    sum: 1.61s, count: 1, min: 1.61s, max: 1.61s
        -- 5 Expand[[n4_12, n4_13, n4_14, n4_15, n4_16, n4_17, n4_18, n4_19, n4_20, n4_21, 0], [n4_12, n4_13, n4_14, n4_15, n4_16, n4_17, n4_18, n4_19, n4_20, null, 1], [n4_12, n4_13, n4_14, n4_15, n4_16, n4_17, n4_18, n4_19, null, null, 3], [n4_12, n4_13, n4_14, n4_15, n4_16, n4_17, n4_18, null, null, null, 7], [n4_12, n4_13, n4_14, n4_15, n4_16, n4_17, null, null, null, null, 15], [n4_12, n4_13, n4_14, n4_15, n4_16, null, null, null, null, null, 31], [n4_12, n4_13, n4_14, n4_15, null, null, null, null, null, null, 63], [n4_12, n4_13, n4_14, null, null, null, null, null, null, null, 127], [n4_12, n4_13, null, null, null, null, null, null, null, null, 255]] -> n5_0:INTEGER, n5_1:DECIMAL(7, 2), n5_2:VARCHAR, n5_3:VARCHAR, n5_4:VARCHAR, n5_5:VARCHAR, n5_6:INTEGER, n5_7:INTEGER, n5_8:INTEGER, n5_9:VARCHAR, n5_10:BIGINT
           Output: 29551140 rows (3.91GB, 8190 batches), Cpu time: 474.34ms, Blocked wall time: 0ns, Peak memory: 1.00KB, Memory allocations: 32760, Threads: 1
              runningAddInputWallNanos     sum: 556.97us, count: 1, min: 556.97us, max: 556.97us
              runningFinishWallNanos       sum: 161ns, count: 1, min: 161ns, max: 161ns
              runningGetOutputWallNanos    sum: 580.35ms, count: 1, min: 580.35ms, max: 580.35ms
          -- 4 Project[expressions: (n4_12:INTEGER, "n3_13"), (n4_13:DECIMAL(7, 2), "n3_14"), (n4_14:VARCHAR, "n3_22"), (n4_15:VARCHAR, "n3_21"), (n4_16:VARCHAR, "n3_20"), (n4_17:VARCHAR, "n3_23"), (n4_18:INTEGER, "n3_15"), (n4_19:INTEGER, "n3_17"), (n4_20:INTEGER, "n3_16"), (n4_21:VARCHAR, "n3_18")] -> n4_12:INTEGER, n4_13:DECIMAL(7, 2), n4_14:VARCHAR, n4_15:VARCHAR, n4_16:VARCHAR, n4_17:VARCHAR, n4_18:INTEGER, n4_19:INTEGER, n4_20:INTEGER, n4_21:VARCHAR
             Output: 3283460 rows (724.46MB, 910 batches), Cpu time: 3.66ms, Blocked wall time: 0ns, Peak memory: 0B, Memory allocations: 0, Threads: 1
                runningAddInputWallNanos     sum: 276.29us, count: 1, min: 276.29us, max: 276.29us
                runningFinishWallNanos       sum: 174ns, count: 1, min: 174ns, max: 174ns
                runningGetOutputWallNanos    sum: 2.71ms, count: 1, min: 2.71ms, max: 2.71ms
            -- 3 Project[expressions: (n3_12:INTEGER, "n0_0"), (n3_13:INTEGER, "n0_1"), (n3_14:DECIMAL(7, 2), "n0_2"), (n3_15:INTEGER, "n0_3"), (n3_16:INTEGER, "n0_4"), (n3_17:INTEGER, "n0_5"), (n3_18:VARCHAR, "n0_6"), (n3_19:INTEGER, "n1_0"), (n3_20:VARCHAR, "n1_1"), (n3_21:VARCHAR, "n1_2"), (n3_22:VARCHAR, "n1_3"), (n3_23:VARCHAR, "n1_4")] -> n3_12:INTEGER, n3_13:INTEGER, n3_14:DECIMAL(7, 2), n3_15:INTEGER, n3_16:INTEGER, n3_17:INTEGER, n3_18:VARCHAR, n3_19:INTEGER, n3_20:VARCHAR, n3_21:VARCHAR, n3_22:VARCHAR, n3_23:VARCHAR
               Output: 3283460 rows (758.23MB, 910 batches), Cpu time: 5.54ms, Blocked wall time: 0ns, Peak memory: 0B, Memory allocations: 0, Threads: 1
                  runningAddInputWallNanos     sum: 562.53us, count: 1, min: 562.53us, max: 562.53us
                  runningFinishWallNanos       sum: 165ns, count: 1, min: 165ns, max: 165ns
                  runningGetOutputWallNanos    sum: 4.27ms, count: 1, min: 4.27ms, max: 4.27ms
              -- 2 HashJoin[INNER n0_0=n1_0] -> n0_0:INTEGER, n0_1:INTEGER, n0_2:DECIMAL(7, 2), n0_3:INTEGER, n0_4:INTEGER, n0_5:INTEGER, n0_6:VARCHAR, n1_0:INTEGER, n1_1:VARCHAR, n1_2:VARCHAR, n1_3:VARCHAR, n1_4:VARCHAR
                 Output: 3283460 rows (758.23MB, 910 batches), Cpu time: 521.65ms, Blocked wall time: 0ns, Peak memory: 6.72MB, Memory allocations: 2785
                 HashBuild: Input: 12295 rows (1.47MB, 4 batches), Output: 0 rows (0B, 0 batches), Cpu time: 3.88ms, Blocked wall time: 0ns, Peak memory: 6.06MB, Memory allocations: 11, Threads: 1
                    distinctKey0                 sum: 12296, count: 1, min: 12296, max: 12296
                    hashtable.buildWallNanos     sum: 1.24ms, count: 1, min: 1.24ms, max: 1.24ms
                    hashtable.capacity           sum: 203978, count: 1, min: 203978, max: 203978
                    hashtable.numDistinct        sum: 12295, count: 1, min: 12295, max: 12295
                    hashtable.numRehashes        sum: 1, count: 1, min: 1, max: 1
                    queuedWallNanos              sum: 0ns, count: 1, min: 0ns, max: 0ns
                    rangeKey0                    sum: 203978, count: 1, min: 203978, max: 203978
                    runningAddInputWallNanos     sum: 3.93ms, count: 1, min: 3.93ms, max: 3.93ms
                    runningFinishWallNanos       sum: 0ns, count: 1, min: 0ns, max: 0ns
                    runningGetOutputWallNanos    sum: 1.36us, count: 1, min: 1.36us, max: 1.36us
                 HashProbe: Input: 3283460 rows (195.08MB, 910 batches), Output: 3283460 rows (758.23MB, 910 batches), Cpu time: 517.78ms, Blocked wall time: 0ns, Peak memory: 675.00KB, Memory allocations: 2774, Threads: 1
                    queuedWallNanos              sum: 1.00us, count: 1, min: 1.00us, max: 1.00us
                    runningAddInputWallNanos     sum: 24.62ms, count: 1, min: 24.62ms, max: 24.62ms
                    runningFinishWallNanos       sum: 546ns, count: 1, min: 546ns, max: 546ns
                    runningGetOutputWallNanos    sum: 502.54ms, count: 1, min: 502.54ms, max: 502.54ms
                -- 0 ValueStream[] -> n0_0:INTEGER, n0_1:INTEGER, n0_2:DECIMAL(7, 2), n0_3:INTEGER, n0_4:INTEGER, n0_5:INTEGER, n0_6:VARCHAR
                   Input: 0 rows (0B, 0 batches), Output: 3283460 rows (195.08MB, 910 batches), Cpu time: 225.96ms, Blocked wall time: 0ns, Peak memory: 0B, Memory allocations: 0, Threads: 1
                      runningAddInputWallNanos     sum: 0ns, count: 1, min: 0ns, max: 0ns
                      runningFinishWallNanos       sum: 7.93us, count: 1, min: 7.93us, max: 7.93us
                      runningGetOutputWallNanos    sum: 229.59ms, count: 1, min: 229.59ms, max: 229.59ms
                -- 1 ValueStream[] -> n1_0:INTEGER, n1_1:VARCHAR, n1_2:VARCHAR, n1_3:VARCHAR, n1_4:VARCHAR
                   Input: 0 rows (0B, 0 batches), Output: 12295 rows (1.47MB, 4 batches), Cpu time: 9.33ms, Blocked wall time: 0ns, Peak memory: 0B, Memory allocations: 0, Threads: 1
                      runningAddInputWallNanos     sum: 0ns, count: 1, min: 0ns, max: 0ns
                      runningFinishWallNanos       sum: 1.43ms, count: 1, min: 1.43ms, max: 1.43ms
                      runningGetOutputWallNanos    sum: 13.62ms, count: 1, min: 13.62ms, max: 13.62ms

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

Yohahaha commented 6 months ago
24/05/07 11:38:02 WARN [Executor task launch worker for task 0.0 in stage 7.0 (TID 1024)] NativeMemoryManager: About to release memory manager, usage dump:
    WholeStageIterator:                   Current used bytes: 0.0 B, peak bytes:  577.0 MiB
    +- task.Gluten_Stage_7_TID_1024:      Current used bytes: 0.0 B, peak bytes:  577.0 MiB
    |  +- node.9:                         Current used bytes: 0.0 B, peak bytes: 1024.0 KiB
    |  |  \- op.9.0.0.FilterProject:      Current used bytes: 0.0 B, peak bytes:   24.0 KiB
    |  +- node.1:                         Current used bytes: 0.0 B, peak bytes:      0.0 B
    |  |  \- op.1.1.0.ValueStream:        Current used bytes: 0.0 B, peak bytes:      0.0 B
    |  +- node.4:                         Current used bytes: 0.0 B, peak bytes:      0.0 B
    |  |  \- op.4.0.0.FilterProject:      Current used bytes: 0.0 B, peak bytes:      0.0 B
    |  +- node.5:                         Current used bytes: 0.0 B, peak bytes: 1024.0 KiB
    |  |  \- op.5.0.0.Expand:             Current used bytes: 0.0 B, peak bytes:   1024.0 B
    |  +- node.8:                         Current used bytes: 0.0 B, peak bytes:      0.0 B
    |  |  \- op.8.0.0.FilterProject:      Current used bytes: 0.0 B, peak bytes:      0.0 B
    |  +- node.6:                         Current used bytes: 0.0 B, peak bytes: 1024.0 KiB
    |  |  \- op.6.0.0.FilterProject:      Current used bytes: 0.0 B, peak bytes:  112.8 KiB
    |  +- node.7:                         Current used bytes: 0.0 B, peak bytes:  552.0 MiB
    |  |  \- op.7.0.0.PartialAggregation: Current used bytes: 0.0 B, peak bytes:  463.5 MiB
    |  +- node.3:                         Current used bytes: 0.0 B, peak bytes:      0.0 B
    |  |  \- op.3.0.0.FilterProject:      Current used bytes: 0.0 B, peak bytes:      0.0 B
    |  +- node.2:                         Current used bytes: 0.0 B, peak bytes:   32.0 MiB
    |  |  +- op.2.1.0.HashBuild:          Current used bytes: 0.0 B, peak bytes:    6.1 MiB
    |  |  \- op.2.0.0.HashProbe:          Current used bytes: 0.0 B, peak bytes:  696.0 KiB
    |  \- node.0:                         Current used bytes: 0.0 B, peak bytes:      0.0 B
    |     \- op.0.0.0.ValueStream:        Current used bytes: 0.0 B, peak bytes:      0.0 B
    \- WholeStageIterator_default_leaf:   Current used bytes: 0.0 B, peak bytes:      0.0 B
24/05/07 11:38:02 WARN [Executor task launch worker for task 0.0 in stage 7.0 (TID 1024)] NativeMemoryManager: before release mem manager, 920.0 MiB