apache / drill

Apache Drill is a distributed MPP query layer for self describing data
https://drill.apache.org/
Apache License 2.0
1.95k stars 979 forks source link

DRILL-8490: Sender operator fake memory leak result to sql failed and memory statistics error when ChannelClosedException #2917

Closed shfshihuafeng closed 4 months ago

shfshihuafeng commented 5 months ago

DRILL-8490: Sender operator fake memory leak result to sql failed and memory statistics error when ChannelClosedException

Description

when ChannelClosedException, .ReconnectingConnection#CloseHandler release sendingAccountor reference counter before netty release buffer, so operator was closed before memory is released by netty

Documentation

later

Testing

  1. TPCH test condition (1) script
    i run sql8 (sql detail as Additional context) with 20 concurrent
    tpch test script

fileName=/data/drill/tpch_sql/1s/shf.txt

random_sql(){

for i in seq 1 30

while true do

num=$((RANDOM%22+1)) if [ -f $fileName ]; then echo "$fileName" " is exit" exit 0 else $drill_home/sqlline -u \"jdbc:drill:zk=ip:2181/drill/drillbits1_performance_test_shf\" -f tpch_sql8.sql >>/tpch1s_sql${num}.log 2>&1 fi done }


(2) parameter

{DRILL_MAX_DIRECT_MEMORY:-"5G"}
open debug : set drill.memory.debug.allocator =true (Check for memory leaks )
(3) sql8

`select   o_year,   sum(case when nation = 'CHINA' then volume else 0 end) / sum(volume) as mkt_share   from (  select   extract(year from o_orderdate) as o_year,   l_extendedprice * 1.0 as volume,   n2.n_name as nation   from hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, hive.tpch1s.nation n2, hive.tpch1s.region  where   p_partkey = l_partkey   and s_suppkey = l_suppkey   and l_orderkey = o_orderkey   and o_custkey = c_custkey   and c_nationkey = n1.n_nationkey   and n1.n_regionkey = r_regionkey   and r_name = 'ASIA'   and s_nationkey = n2.n_nationkey   and o_orderdate between date '1995-01-01'   and date '1996-12-31'   and p_type = 'LARGE BRUSHED BRASS') as all_nations   group by o_year   order by o_year`

2. test step
(1) run script
(2) when the log contains exception information ,stop script