apache / drill

Apache Drill is a distributed MPP query layer for self describing data
https://drill.apache.org/
Apache License 2.0
1.95k stars 979 forks source link

DRILL-8478. HashPartition memory leak when it allocate memory exception with OutOfMemoryException (#2874) #2875

Closed shfshihuafeng closed 10 months ago

shfshihuafeng commented 10 months ago

DRILL-8478: HashPartition memory leak when OutOfMemoryException is encountered

Description

https://issues.apache.org/jira/browse/DRILL-8478 when allocating memory for hashParttion with OutOfMemoryException,it cause memory leak. beacuase hashpartiton object cannot be created successfully, so it cannot be cleaned up In the closing phase.

Documentation

(Please describe user-visible changes similar to what should appear in the Drill documentation.)

Testing

(Please describe how this PR has been tested.)

1. TPCH test condition

(1) script i run sql8 (sql detail as Additional context) with 20 concurrent tpch test script

fileName=/data/drill/tpch_sql/1s/shf.txt

random_sql(){
#for i in `seq 1 3`
while true
do

  num=$((RANDOM%22+1))
  if [ -f $fileName ]; then
  echo "$fileName" " is exit"
  exit 0
  else
          $jupiter_home/sqlline -u \"jdbc:drill:zk=ip:2181/drill/jupiterbits1_performance_test_shf\" -f tpch_sql8.sql >> log_wubq/tpch1s_sql${num}.log 2>&1
  fi
done
}

(2) parameter

(3) sql8

select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / sum(volume) as mkt_share from ( select extract(year from o_orderdate) as o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as all_nations group by o_year order by o_year

2. test step

(1) run script (2) when the log contains exception information ,stop script

Caused by: org.apache.drill.exec.exception.OutOfMemoryException: (op:5:1:1:HashJoinPOP) Unable to allocate buffer of size 4096 due to memory limit (41943040). Current allocation: 6166528

(3) there is no sql running,but memory is not 0.

image

(4) leak info from log

2024-01-16 17:28:18,760 [1a59b3e9-6936-1560-e8e5-3ad6a66925b1:frag:5:1] WARN  o.a.drill.exec.memory.BaseAllocator - Closed child allocator[op:5:1:1:HashJoinPOP] on parent allocator[frag:5:1]'s child list.
Allocator(frag:5:1) 5000000/1000000/31035392/40041943040 (res/actual/peak/limit)
  child allocators: 1
    Allocator(op:5:1:1:HashJoinPOP) 1000000/98304/22822912/41943040 (res/actual/peak/limit)
      child allocators: 0
      ledgers: 11
        ledger[193578] allocator: op:5:1:1:HashJoinPOP), isOwning: true, size: 8192, references: 2, life: 4048793488977884..0, allocatorManager: [168316, life: 4048793488974969..0] holds 4 buffers.
            DrillBuf[198176], udle: [168315 959..8192]

3. fixed

 Run the same test  with the fixed code。there is no sql runing , i find  dir memory is 0 and i can not find leak log like (3)
jnturton commented 10 months ago

An unsued import crept in, could you remove it please?

shfshihuafeng commented 10 months ago

An unsued import crept in, could you remove it please?

removed it