Tencent / TBase

TBase is an enterprise-level distributed HTAP database. Through a single database cluster to provide users with highly consistent distributed database services and high-performance data warehouse services, a set of integrated enterprise-level solutions is formed.
Other
1.38k stars 262 forks source link

Query crashing the server, -Nan..NaN cost. Works on PG-XL. #37

Open yazun opened 4 years ago

yazun commented 4 years ago

The query works well on XL, but on the TBase it crashes the server:

run is replicated, rundataspec is distributed. Statistics are up to date.

 select
r.runid,
r.runname,
r.runmetadata->>'runRequesterUserName'::text username,
r.size,
r.state::text,
sum(stat.processed)::bigint processed,
sum(stat.total)::bigint total,
r.creationdate,
--max(stat.lastupdate)::timestamp with time zone lastupdate,
r.lastupdatedate
from run r
right join (select status,  count(*) total,
count(*) filter ( where status = 4 ) processed, rd.runid
from rundataspec rd where  rd.runid in (select rr.runid
from run rr  where rr.creationdate >= now() - interval '1 days'
and ( rr.lastupdatedate is null or rr.lastupdatedate >= now() - interval '1 days'))
group by status,rd.runid) stat on (stat.runid = r.runid)
group by r.runid,r.size,r.state::text,r.creationdate
order by r.creationdate desc;

and the plan

Remote Subquery Scan on all (datanode1,datanode10,datanode11,datanode12,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8,datanode9)  (cost=-nan..-nan rows=726 width=136)
   ->  Sort  (cost=-nan..-nan rows=726 width=136)
         Sort Key: r.creationdate DESC
         ->  Finalize GroupAggregate  (cost=-nan..-nan rows=726 width=136)
               Group Key: r.runid, ((r.state)::text)
               ->  Sort  (cost=-nan..-nan rows=726 width=0)
                     Sort Key: r.runid, ((r.state)::text)
                     ->  Remote Subquery Scan on all (datanode1,datanode10,datanode11,datanode12,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8,datanode9)  (cost=22768.39..-nan rows=726 width=0)
                           Distribute results by S: runid
                           ->  Partial GroupAggregate  (cost=22668.39..-nan rows=726 width=325)
                                 Group Key: r.runid, ((r.state)::text)
                                 ->  Sort  (cost=22668.39..22670.67 rows=1525 width=277)
                                       Sort Key: r.runid, ((r.state)::text)
                                       ->  Hash Left Join  (cost=22388.95..22620.01 rows=1525 width=277)
                                             Hash Cond: (rd.runid = r.runid)
                                             ->  Finalize HashAggregate  (cost=22195.26..22271.51 rows=1525 width=22)
                                                   Group Key: rd.status, rd.runid
                                                   ->  Remote Subquery Scan on all (datanode1,datanode10,datanode11,datanode12,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8,datanode9)  (cost=22109.86..22190.69 rows=1525 width=0)
                                                         Distribute results by S: runid
                                                         ->  Partial HashAggregate  (cost=22009.86..22086.11 rows=1525 width=22)
                                                               Group Key: rd.status, rd.runid
                                                               ->  Nested Loop  (cost=0.07..21977.67 rows=10733 width=6)
                                                                     ->  Seq Scan on run rr  (cost=0.00..162.83 rows=1 width=4)
                                                                           Filter: ((creationdate >= (now() - '1 day'::interval)) AND ((lastupdatedate IS NULL) OR (lastupdatedate >= (now() - '1 day'::interval))))
                                                                     ->  Index Scan using rundataspec_pkey on rundataspec rd  (cost=0.07..20537.43 rows=25548 width=6)
                                                                           Index Cond: (runid = rr.runid)
                                             ->  Hash  (cost=156.30..156.30 rows=726 width=237)
                                                   ->  Seq Scan on run r  (cost=0.00..156.30 rows=726 width=237)

which finishes with grave error:

WARNING:  pgxc_abort_connections dn node:datanode1 invalid socket 4294967295!
WARNING:  pgxc_abort_connections dn node:datanode5 invalid socket 4294967295!
WARNING:  pgxc_abort_connections dn node:datanode1 invalid socket 4294967295!
WARNING:  pgxc_abort_connections dn node:datanode5 invalid socket 4294967295!
ERROR:  Failed to receive more data from data node 16398
Time: 159019.614 ms (02:39.020)

on the other hand on XL it returns after few ms, the plan is much simpler:

Sort  (cost=23507.41..23507.85 rows=289 width=135)
   Sort Key: r.creationdate DESC
   ->  GroupAggregate  (cost=23481.54..23500.32 rows=289 width=135)
         Group Key: r.runid, ((r.state)::text)
         ->  Sort  (cost=23481.54..23481.97 rows=289 width=135)
               Sort Key: r.runid, ((r.state)::text)
               ->  Nested Loop Left Join  (cost=23036.82..23474.45 rows=289 width=135)
                     ->  GroupAggregate  (cost=23036.73..23054.64 rows=289 width=22)
                           Group Key: rd.status, rd.runid
                           ->  Remote Subquery Scan on all (datanode1,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8)  (cost=23036.73..23038.03 rows=289 width=6)
                                 ->  Sort  (cost=22936.73..22937.16 rows=289 width=6)
                                       Sort Key: rd.status, rd.runid
                                       ->  Nested Loop  (cost=0.11..22929.64 rows=289 width=6)
                                             ->  Seq Scan on run rr  (cost=0.00..4423.48 rows=1 width=4)
                                                   Filter: ((creationdate >= (now() - '1 day'::interval)) AND ((lastupdatedate IS NULL) OR (lastupdatedate >= (now() - '1 day'::interval))))
                                             ->  Index Scan using rundataspec_pkey on rundataspec rd  (cost=0.11..17501.46 rows=20094 width=6)
                                                   Index Cond: (runid = rr.runid)
                     ->  Materialize  (cost=100.10..101.45 rows=1 width=95)
                           ->  Remote Subquery Scan on all (datanode2)  (cost=100.10..101.45 rows=1 width=95)
                                 ->  Index Scan using run_pkey on run r  (cost=0.10..1.35 rows=1 width=95)
                                       Index Cond: (rd.runid = runid)
yazun commented 4 years ago
set prefer_olap = false; 

brings back a proper plan and execution within ms. Still, this looks like a serious bug, recurrent distributed query where tables are small and driving one is replicated.

                                                                                                                QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=23052.40..23053.49 rows=726 width=136)
   Sort Key: r.creationdate DESC
   ->  GroupAggregate  (cost=22978.52..23031.70 rows=726 width=136)
         Group Key: r.runid, ((r.state)::text)
         ->  Sort  (cost=22978.52..22980.81 rows=1525 width=277)
               Sort Key: r.runid, ((r.state)::text)
               ->  Hash Left Join  (cost=22602.49..22930.14 rows=1525 width=277)
                     Hash Cond: (rd.runid = r.runid)
                     ->  GroupAggregate  (cost=22408.80..22581.65 rows=1525 width=22)
                           Group Key: rd.status, rd.runid
                           ->  Sort  (cost=22408.80..22424.90 rows=10733 width=6)
                                 Sort Key: rd.status, rd.runid
                                 ->  Nested Loop  (cost=0.07..21977.67 rows=10733 width=6)
                                       ->  Remote Subquery Scan on all (datanode1)  (cost=100.00..262.84 rows=1 width=4)
                                             ->  Seq Scan on run rr  (cost=0.00..162.83 rows=1 width=4)
                                                   Filter: ((creationdate >= (now() - '1 day'::interval)) AND ((lastupdatedate IS NULL) OR (lastupdatedate >= (now() - '1 day'::interval))))
                                       ->  Materialize  (cost=100.08..20752.41 rows=25548 width=6)
                                             ->  Remote Subquery Scan on all (datanode1,datanode10,datanode11,datanode12,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8,datanode9)  (cost=100.08..20714.09 rows=25548 width=6)
                                                   ->  Index Scan using rundataspec_pkey on rundataspec rd  (cost=0.07..20537.43 rows=25548 width=6)
                                                         Index Cond: (runid = rr.runid)
                     ->  Hash  (cost=258.50..258.50 rows=726 width=237)
                           ->  Remote Subquery Scan on all (datanode1)  (cost=100.00..258.50 rows=726 width=237)
                                 ->  Seq Scan on run r  (cost=0.00..156.30 rows=726 width=237)
(23 rows)
SuCuicui commented 4 years ago

Is there a core dump file? If possible, please provide the call stack to locate the problem quickly. Thank you!