apache / drill

Apache Drill is a distributed MPP query layer for self describing data
https://drill.apache.org/
Apache License 2.0
1.92k stars 985 forks source link

DRILL-8483: SpilledRecordBatch memory leak when the program threw an … #2888

Closed shfshihuafeng closed 3 months ago

shfshihuafeng commented 4 months ago

…exception during the process of building a hash table (#2887)

DRILL-8483: SpilledRecordBatch memory leak when the program threw an exception during the process of building a hash table

Description

During the process of reading data from disk to building hash tables in memory, if an exception is thrown, it will result in a memory SpilledRecordBatch leak there is no sql runing image

memory image

leak info


Allocator(frag:5:1) 5000000/1000000/31035392/40041943040 (res/actual/peak/limit)
    Allocator(op:5:1:1:HashJoinPOP) 1000000/98304/22822912/41943040 (res/actual/peak/limit)
Allocator(frag:5:1) 5000000/1000000/31035392/40041943040 (res/actual/peak/limit)
    Allocator(op:5:1:1:HashJoinPOP) 1000000/69632/22822912/41943040 (res/actual/peak/limit)
Allocator(frag:5:1) 5000000/1000000/30642176/40041943040 (res/actual/peak/limit)
    Allocator(op:5:1:1:HashJoinPOP) 1000000/36864/22822912/41943040 (res/actual/peak/limit)
Allocator(frag:5:1) 5000000/1000000/31035392/40041943040 (res/actual/peak/limit)
    Allocator(op:5:1:1:HashJoinPOP) 1000000/8192/22822912/41943040 (res/actual/peak/limit)
Allocator(frag:5:1) 5000000/1000000/31035392/40041943040 (res/actual/peak/limit)
    Allocator(op:5:1:1:HashJoinPOP) 1000000/98304/22822912/41943040 (res/actual/peak/limit)
Allocator(frag:5:1) 5000000/1000000/31035392/40041943040 (res/actual/peak/limit)
    Allocator(op:5:1:1:HashJoinPOP) 1000000/36864/22822912/41943040 (res/actual/peak/limit)
Allocator(frag:5:1) 5000000/1000000/31035392/40041943040 (res/actual/peak/limit)
    Allocator(op:5:1:1:HashJoinPOP) 1000000/98304/22822912/41943040 (res/actual/peak/limit)
Allocator(frag:5:0) 5000000/1000000/31067136/40041943040 (res/actual/peak/limit)
    Allocator(op:5:0:1:HashJoinPOP) 1000000/98304/22822912/41943040 (res/actual/peak/limit)

Documentation

(Please describe user-visible changes similar to what should appear in the Drill documentation.)

Testing

prepare data for tpch 1s

  1. 30 concurrent for tpch sql8
  2. set direct memory 5g
  3. when it had OutOfMemoryException , stopped all sql.
  4. finding memory leak

test script

random_sql(){
#for i in `seq 1 3`
while true
do

  num=$((RANDOM%22+1))
  if [ -f $fileName ]; then
  echo "$fileName" " is exit"
  exit 0
  else
          $drill_home/sqlline -u \"jdbc:drillr:zk=ip:2181/drillbits_shf\" -f tpch_sql8.sql >> sql8.log 2>&1
  fi
done
}

main(){
#sleep 2h

#TPCH power test
for i in `seq 1 30`
do
        random_sql &
done
}