Tencent / TBase

TBase is an enterprise-level distributed HTAP database. Through a single database cluster to provide users with highly consistent distributed database services and high-performance data warehouse services, a set of integrated enterprise-level solutions is formed.
Other
1.38k stars 262 forks source link

Parallel self-join stalls forever #108

Open yazun opened 2 years ago

yazun commented 2 years ago

We noticed no errors this time, but the query that takes a second while run with parallelism disabled stalls forever when parallel workers are used.

explain select distinct                                                                                                                                                                                                                                                                                                                                                                                                                                 x.varitype i,
     y.varitype j,
     count(distinct x.sourceid)::numeric cnt
 from
     dr3_ops_cs36_mv.dr3_common_export x
     join dr3_ops_cs36_mv.dr3_common_export y using (sourceid)
 where
     x.varitype <> y.varitype
 group by     x.varitype,    y.varitype
 order by 1,2
;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Remote Subquery Scan on all (datanode1,datanode10,datanode11,datanode12,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8,datanode9)  (cost=227720.65..227720.66 rows=1 width=52)
   ->  Unique  (cost=227720.65..227720.66 rows=1 width=52)
         ->  Sort  (cost=227720.65..227720.66 rows=1 width=52)
               Sort Key: x.varitype, y.varitype, ((count(DISTINCT x.sourceid))::numeric)
               ->  Parallel GroupAggregate  (cost=222663.63..227720.65 rows=1 width=52)
                     Group Key: x.varitype, y.varitype
                     ->  Parallel Sort  (cost=222663.63..223927.88 rows=842827 width=28)
                           Sort Key: x.varitype, y.varitype
                           ->  Parallel Remote Subquery Scan on all (datanode1,datanode10,datanode11,datanode12,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8,datanode9)  (cost=1100.11..153092.84 rows=842827 width=28)
                                 Distribute results by S: varitype
                                 ->  Gather  (cost=1000.11..126865.20 rows=842827 width=28)
                                       Workers Planned: 6
                                       ->  Parallel Nested Loop  (cost=0.11..41582.50 rows=140471 width=28)
                                             ->  Parallel Seq Scan on dr3_common_export x  (cost=0.00..20614.94 rows=88635 width=18)
                                             ->  Index Scan using idx_id_dr3_common_export on dr3_common_export y  (cost=0.11..0.19 rows=1 width=18)
                                                   Index Cond: (sourceid = x.sourceid)
                                                   Filter: (x.varitype <> varitype)
(17 rows)

This plan stalls for hours:

Time: 62.629 ms
(dr3_ops_cs36@gaiadb10i:55431) [surveys] > explain analyze select distinct
     x.varitype i,
     y.varitype j,
     count(distinct x.sourceid)::numeric cnt
 from
     dr3_ops_cs36_mv.dr3_common_export x
     join dr3_ops_cs36_mv.dr3_common_export y using (sourceid)
 where
     x.varitype <> y.varitype
 group by     x.varitype,    y.varitype
 order by 1,2
;
^CCancel request sent
ERROR:  canceling statement due to user request
Time: 21936.133 ms (00:21.936)

Disabling parallelism:

(dr3_ops_cs36@gaiadb10i:55431) [surveys] > set max_parallel_workers_per_gather to 0;
SET
Time: 0.165 ms
(dr3_ops_cs36@gaiadb10i:55431) [surveys] > explain analyze select distinct
     x.varitype i,
     y.varitype j,
     count(distinct x.sourceid)::numeric cnt
 from
     dr3_ops_cs36_mv.dr3_common_export x
     join dr3_ops_cs36_mv.dr3_common_export y using (sourceid)
 where
     x.varitype <> y.varitype
 group by     x.varitype,    y.varitype
 order by 1,2
;
                                                                                                                       QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Remote Subquery Scan on all (datanode1,datanode10,datanode11,datanode12,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8,datanode9)  (cost=239654.40..239654.40 rows=1 width=52) (actual time=1318.199..1318.265 rows=76 loops=1)
   ->  Unique  (cost=239654.40..239654.40 rows=1 width=52)
         DN (actual startup time=872.718..1316.922 total time=872.718..1316.933 rows=0..25 loops=1..1)
         ->  Sort  (cost=239654.40..239654.40 rows=1 width=52)
               DN (actual startup time=872.717..1316.921 total time=872.717..1316.922 rows=0..25 loops=1..1)
               Sort Key: x.varitype, y.varitype, ((count(DISTINCT x.sourceid))::numeric)
               Sort Method: quicksort  Memory: 26kB
               ->  Parallel GroupAggregate  (cost=234597.38..239654.39 rows=1 width=52)
                     DN (actual startup time=872.408..1000.377 total time=872.408..1316.810 rows=0..25 loops=1..1)
                     Group Key: x.varitype, y.varitype
                     ->  Parallel Sort  (cost=234597.38..235861.62 rows=842827 width=28)
                           DN (actual startup time=872.403..1000.262 total time=872.403..1108.219 rows=0..465827 loops=1..1)
                           Sort Key: x.varitype, y.varitype
                           Sort Method: quicksort  Disk: 22960kB
                           ->  Parallel Remote Subquery Scan on all (datanode1,datanode10,datanode11,datanode12,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8,datanode9)  (cost=73067.41..165026.58 rows=842827 width=28)
                                 DN (actual startup time=404.668..873.097 total time=556.962..873.097 rows=0..465827 loops=1..1)
                                 Distribute results by S: varitype
                                 ->  Hash Join  (cost=72967.41..138798.94 rows=842827 width=28)
                                       DN (actual startup time=198.194..235.046 total time=546.384..652.476 rows=63234..64602 loops=1..1)
                                       Hash Cond: (x.sourceid = y.sourceid)
                                       Join Filter: (x.varitype <> y.varitype)
                                       ->  Seq Scan on dr3_common_export x  (cost=0.00..42773.65 rows=531809 width=18)
                                             DN (actual startup time=6.189..15.291 total time=84.075..98.950 rows=530581..533699 loops=1..1)
                                       ->  Hash  (cost=42773.65..42773.65 rows=531809 width=18)
                                             DN (actual startup time=170.855..203.152 total time=170.855..203.152 rows=530581..533699 loops=1..1)
                                             Buckets: 32768  Batches: 32  Memory Usage: 1126kB
                                             ->  Seq Scan on dr3_common_export y  (cost=0.00..42773.65 rows=531809 width=18)
                                                   DN (actual startup time=4.139..14.630 total time=82.061..96.865 rows=530581..533699 loops=1..1)
 Planning time: 57.849 ms
 Execution time: 1336.961 ms
(30 rows)
JennyJennyChen commented 2 years ago

Thank you for your feedback. We also found that there are some problems with the current parallel mechanism, such as hang or coredump when SQL is executed. Because there are many modules involved, we will do a detailed analysis on this in the follow-up, and it is expected that improvements will be made in the next version.

yazun commented 2 years ago

Thanks for the update! When would you plan publishing the next version roughly speaking? Can we assume https://github.com/Tencent/TBase/issues/106 is falling into the same bucket?

JennyJennyChen commented 2 years ago

Thanks for the update! When would you plan publishing the next version roughly speaking? Can we assume #106 is falling into the same bucket?

The next version is expected to be released in Q1 of 2022. yes, #106 will also be considered and resolved

beth-database commented 2 years ago

Hi, i try to solve the problem, but i can‘t reproduce the problem. Can you give the steps to reproduce this problem? May be the problem is related to the definition of tables that used in the query? Or special set of guc parameters?
Or the amount of data? Can you meet the problem when data is less?

yazun commented 2 years ago

Thanks for checking! Should we update to the latest version you published few days ago and recheck? Quite likely related to this specific case with Parallel NL vs Hash Join..

yazun commented 2 years ago

i.e. when just rerun it now, this is the bt from the coordinator when cancelled from within gdb:

Program stopped.
0x00007f761cf36e93 in __epoll_wait_nocancel () from /lib64/libc.so.6
(gdb) c
Continuing.
^C
Program received signal SIGINT, Interrupt.
0x00007f761cf2bc20 in __poll_nocancel () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f761cf2bc20 in __poll_nocancel () from /lib64/libc.so.6
#1  0x0000000000523d10 in pgxc_node_receive.constprop.0 (connections=0x7ffd98cbc238, timeout=<optimized out>, conn_count=1) at pgxcnode.c:844
#2  0x0000000000977187 in FetchTuple (combiner=0x1f55558) at execRemote.c:2347
#3  0x000000000058e652 in getlen_datanode (state=<optimized out>, tapenum=<optimized out>, eofOK=<optimized out>) at tuplesort.c:3988
#4  0x000000000058690f in mergereadnext (stup=0x7ffd98cbc2c0, srcTape=0, state=0x1db55d8) at tuplesort.c:3063
#5  beginmerge (state=state@entry=0x1db55d8) at tuplesort.c:3040
#6  0x000000000059177f in tuplesort_begin_merge (tupDesc=<optimized out>, nkeys=<optimized out>, attNums=0x2162a18, sortOperators=0x2162be0, sortCollations=0x2162bf8, nullsFirstFlags=0x2162c10 "", combiner=0x1f55558, workMem=2048) at tuplesort.c:2687
#7  0x000000000096533f in ExecRemoteSubplan (pstate=<optimized out>) at execRemote.c:11117
#8  0x00000000009f080b in ExecProcNodeInstr (node=0x1f55558) at execProcnode.c:553
#9  0x00000000009fdcc6 in ExecProcNode (node=0x1f55558) at ../../../src/include/executor/executor.h:275
#10 ExecutePlan (execute_once=<optimized out>, dest=0xf21460 <donothingDR.lto_priv.0>, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x1f55558, estate=<optimized out>) at execMain.c:2061
#11 standard_ExecutorRun (queryDesc=<optimized out>, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:471
#12 0x0000000000a7ebb5 in ExecutorRun (execute_once=1 '\001', count=0, direction=<optimized out>, queryDesc=0x1f539d8) at execMain.c:414
#13 ExplainOnePlan (plannedstmt=plannedstmt@entry=0x1f51eb8, into=into@entry=0x0, es=es@entry=0x1d68320,
    queryString=queryString@entry=0x1bce578 "explain analyze select distinct\n     x.varitype i,\n     y.varitype j,\n     count(distinct x.sourceid)::numeric cnt\n from\n     dr3_ops_cs36_mv.dr3_common_export x\n     join dr3_ops_cs36_mv.dr3_common_export y using (sourceid)\n where\n     x.varitype <> y.varitype\n group by     x.varitype,    y.varitype\n order by 1,2;", params=params@entry=0x0, queryEnv=<optimized out>, planduration=0x7ffd98cbc700) at explain.c:581
#14 0x0000000000a7ee93 in ExplainOneQuery (query=<optimized out>, cursorOptions=256, into=0x0, es=0x1d68320,
    queryString=0x1bce578 "explain analyze select distinct\n     x.varitype i,\n     y.varitype j,\n     count(distinct x.sourceid)::numeric cnt\n from\n     dr3_ops_cs36_mv.dr3_common_export x\n     join dr3_ops_cs36_mv.dr3_common_export y using (sourceid)\n where\n     x.varitype <> y.varitype\n group by     x.varitype,    y.varitype\n order by 1,2;",
    params=0x0, queryEnv=0x0) at explain.c:415
#15 0x0000000000a8248d in ExplainQuery (pstate=0x1d68210, stmt=0x1bd0518, queryString=<optimized out>, params=0x0, queryEnv=0x0, dest=0x1d68180) at explain.c:281
#16 0x000000000075c327 in standard_ProcessUtility (pstmt=0x1f4b438,
    queryString=0x1bce578 "explain analyze select distinct\n     x.varitype i,\n     y.varitype j,\n     count(distinct x.sourceid)::numeric cnt\n from\n     dr3_ops_cs36_mv.dr3_common_export x\n     join dr3_ops_cs36_mv.dr3_common_export y using (sourceid)\n where\n     x.varitype <> y.varitype\n group by     x.varitype,    y.varitype\n order by 1,2;",
    context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x1d68180, sentToRemote=0 '\000', completionTag=0x7ffd98cbd920 "") at utility.c:2182
#17 0x000000000075e74c in ProcessUtility (completionTag=0x7ffd98cbd920 "", sentToRemote=0 '\000', dest=0x1d68180, queryEnv=<optimized out>, params=<optimized out>, context=PROCESS_UTILITY_TOPLEVEL, queryString=<optimized out>, pstmt=0x1f4b438) at xact.c:7701
#18 PortalRunUtility (portal=0x1d01888, pstmt=0x1f4b438, isTopLevel=<optimized out>, setHoldSnapshot=<optimized out>, dest=0x1d68180, completionTag=0x7ffd98cbd920 "") at pquery.c:1993
#19 0x0000000000761e19 in FillPortalStore (portal=0x1d01888, isTopLevel=<optimized out>) at ../../../src/include/nodes/pg_list.h:79
#20 0x00000000007631f1 in PortalRun (portal=0x1d01888, count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized out>, dest=0x1c6c6a0, altdest=0x1c6c6a0, completionTag=0x7ffd98cbdb60 "") at pquery.c:1351
#21 0x000000000076bf03 in exec_simple_query (query_string=<optimized out>) at postgres.c:1511
#22 0x0000000000765cc5 in PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at postgres.c:5456
#23 0x0000000000829f54 in BackendRun (port=0x1b13c30) at postmaster.c:4982
#24 BackendStartup (port=0x1b13c30) at postmaster.c:4654
#25 ServerLoop () at postmaster.c:1959
#26 0x000000000082b999 in PostmasterMain (argc=<optimized out>, argv=0x1ae7880) at postmaster.c:1567
#27 0x00000000004f4a1d in main (argc=5, argv=0x1ae7880) at main.c:233
beth-database commented 2 years ago

@yazun Thanks for your reply! If you can not reproduce the problem in the new version ,but can reproduce it easily in current version, it may show that the problem has been sloved in new version. However, if the new version has the problem too, you can do like the following steps to have more information. 1、execute 'select * from pg_backend_pid();' to get the coordinator backend pid that execute the hang sql. 2、execute the hang sql. 3、gdb to attach the pid you get in step 1, and get the bt. If the bt is like you pasted, then you can do 'p connections' to check the datanode name and it's pid. 4、gdb to attach the datanode pid, and you can get the datanode bt. May be we can know what the datanode is doing when it hangs.

I try to construct use cases, and the execution plan is consistent with the execution plan you pasted, and i also use hash join, but it is not reproduced, so if it's convenient, please tell me the reproduction steps in detail so that we can solve this problem as soon as possible. Look forward to your reply.

yazun commented 2 years ago

Thanks for the hints @beth-database! Will deploy the new version in coming days and will get back to you.

yazun commented 2 years ago

@beth-database , this is on the master from our fork, which has been just merged with TBase 2.2/master :

p **connections
$3 = {nodeoid = 16402, nodeid = 1797586929, nodename = "datanode5", '\000' <repeats 54 times>, nodehost = "gaiadb05i", '\000' <repeats 54 times>, nodeport = 55436, sock = 557, backend_pid = 1484, transaction_status = 84 'T', state = DN_CONNECTION_STATE_QUERY, read_only = 1 '\001', combiner = 0x3a6b238, sendGxidVersion = 1, error = '\000' <repeats 88 times>, " actually, defaults to database\n", ' ' <repeats 40 times>, "# encoding\n\n# These settings are initial"..., outBuffer = 0x39de8b8 "M",
  outSize = 32768, outEnd = 0, inBuffer = 0x30f2118 "S", inSize = 16384, inStart = 40, inEnd = 40, inCursor = 40, ck_resp_rollback = 0 '\000', in_extended_query = 1 '\001', needSync = 0 '\000', sock_fatal_occurred = 0 '\000', last_command = 97 'a', recv_datarows = 0, plpgsql_need_begin_sub_txn = 0 '\000', plpgsql_need_begin_txn = 0 '\000'}

attaching to DN:

[pgxzDR3@gaiadb05 ~]$ gdb attach -p 1484
...
Program received signal SIGINT, Interrupt.
0x00007f0a88d69c20 in __poll_nocancel () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f0a88d69c20 in __poll_nocancel () from /lib64/libc.so.6
#1  0x0000000000be2a50 in pgxc_node_receive.constprop.0 (connections=connections@entry=0x7ffc6dbc8b78, timeout=timeout@entry=0x7ffc6dbc8b80, conn_count=1) at pgxc/pool/pgxcnode.c:844
#2  0x00000000007a3a87 in FetchTuple () at pgxc/pool/execRemote.c:2347
#3  0x00000000007b1cc8 in ExecRemoteSubplan () at pgxc/pool/execRemote.c:11153
#4  0x00000000007233ab in ExecProcNodeInstr (node=0x1d8eac0) at executor/execProcnode.c:553
#5  0x000000000075ecdb in ExecProcNode (node=0x1d8eac0) at executor/../../../src/include/executor/executor.h:275
#6  ExecSort (pstate=0x1d8e650) at executor/nodeSort.c:154
#7  0x00000000007233ab in ExecProcNodeInstr (node=0x1d8e650) at executor/execProcnode.c:553
#8  0x000000000073225c in ExecProcNode (node=0x1d8e650) at executor/../../../src/include/executor/executor.h:275
#9  fetch_input_tuple (aggstate=0x1d8df90) at executor/nodeAgg.c:739
#10 0x000000000073d6a5 in agg_retrieve_direct (aggstate=<optimized out>) at executor/nodeAgg.c:3225
#11 ExecAgg (pstate=<optimized out>) at executor/nodeAgg.c:3036
#12 0x00000000007233ab in ExecProcNodeInstr (node=0x1d8df90) at executor/execProcnode.c:553
#13 0x000000000075ec1f in ExecProcNode (node=0x1d8df90) at executor/../../../src/include/executor/executor.h:275
#14 ExecSort (pstate=0x1d8db20) at executor/nodeSort.c:154
#15 0x00000000007233ab in ExecProcNodeInstr (node=0x1d8db20) at executor/execProcnode.c:553
#16 0x000000000075df7e in ExecProcNode (node=0x1d8db20) at executor/../../../src/include/executor/executor.h:275
#17 ExecUnique (pstate=0x1d8d778) at executor/nodeUnique.c:73
#18 0x00000000007233ab in ExecProcNodeInstr (node=0x1d8d778) at executor/execProcnode.c:553
#19 0x0000000000718538 in ExecProcNode (node=0x1d8d778) at executor/../../../src/include/executor/executor.h:275
#20 ExecutePlan (execute_once=<optimized out>, dest=0x1c6b938, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=<optimized out>, estate=<optimized out>) at executor/execMain.c:2061
#21 standard_ExecutorRun (queryDesc=0x1d8d108, direction=<optimized out>, count=0, execute_once=<optimized out>) at executor/execMain.c:471
#22 0x000000000099d2cc in ExecutorRun (execute_once=<optimized out>, count=0, direction=ForwardScanDirection, queryDesc=0x1d8d108) at executor/execMain.c:414
#23 PortalRunSelect () at tcop/pquery.c:1715
#24 0x00000000009a1531 in PortalRun (portal=portal@entry=0x1d76758, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', run_once=<optimized out>, dest=dest@entry=0x1c6b938, altdest=altdest@entry=0x1c6b938, completionTag=0x7ffc6dbc9600 "") at tcop/pquery.c:1356
#25 0x00000000009a3e3b in exec_execute_message (max_rows=9223372036854775807, portal_name=0x1c6b528 "p_4_7e7_6_60c2ecd9") at tcop/postgres.c:2995
#26 PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at tcop/postgres.c:5571
#27 0x00000000008e3368 in BackendRun (port=0x1b99b00) at postmaster/postmaster.c:4982
#28 BackendStartup (port=0x1b99b00) at postmaster/postmaster.c:4654
#29 ServerLoop () at postmaster/postmaster.c:1959
#30 0x00000000008e425c in PostmasterMain () at postmaster/postmaster.c:1567
#31 0x00000000004f77ad in main (argc=5, argv=0x1b4c710) at main/main.c:233

Following the lead:

(gdb) up
#1  0x0000000000be2a50 in pgxc_node_receive.constprop.0 (connections=connections@entry=0x7ffc6dbc8b78, timeout=timeout@entry=0x7ffc6dbc8b80, conn_count=1) at pgxc/pool/pgxcnode.c:844
844     pgxc/pool/pgxcnode.c: No such file or directory.
(gdb) p **connections
$1 = {nodeoid = 16403, nodeid = 587455710, nodename = "datanode6", '\000' <repeats 54 times>, nodehost = "gaiadb06i", '\000' <repeats 54 times>, nodeport = 55436, sock = 21, backend_pid = 33780, transaction_status = 84 'T', state = DN_CONNECTION_STATE_QUERY, read_only = 1 '\001', combiner = 0x1d8eac0, sendGxidVersion = 1,
  error = "\000d12\000\000\000\000\020\000\000\000\000\000\000\000h\267\302\001\000\000\000\000\"pgxc:coord12\"\000\000\000\b\000\000\000\000\000\000h\267\302\001\000\000\000\000v", '\000' <repeats 223 times>..., outBuffer = 0x1cf5f58 "M", outSize = 16384, outEnd = 0, inBuffer = 0x1cf9f98 "S", inSize = 16384, inStart = 41, inEnd = 41, inCursor = 41, ck_resp_rollback = 0 '\000', in_extended_query = 1 '\001', needSync = 0 '\000', sock_fatal_occurred = 0 '\000', last_command = 97 'a', recv_datarows = 0,
  plpgsql_need_begin_sub_txn = 0 '\000', plpgsql_need_begin_txn = 0 '\000'}

attaching to dn6 process:

 [pgxzDR3@gaiadb06 ~]$ gdb attach -p 33780
 ...
 0x00007f345cc939a3 in select () from /lib64/libc.so.6
(gdb) bt
#0  0x00007f345cc939a3 in select () from /lib64/libc.so.6
#1  0x00000000007e76bc in pg_usleep (microsec=1000) at ../port/pgsleep.c:56
#2  pg_usleep (microsec=1000) at ../port/pgsleep.c:47
#3  WaitForParallelWorkerDone () at pgxc/squeue/squeue.c:8883
#4  0x0000000000742482 in ExecGather (pstate=0x1e2ea88) at executor/nodeGather.c:273
#5  0x00000000007233ab in ExecProcNodeInstr (node=0x1e2ea88) at executor/execProcnode.c:553
#6  0x0000000000718538 in ExecProcNode (node=0x1e2ea88) at executor/../../../src/include/executor/executor.h:275
#7  ExecutePlan (execute_once=<optimized out>, dest=0x1e4f2b8, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=<optimized out>, estate=<optimized out>) at executor/execMain.c:2061
#8  standard_ExecutorRun (queryDesc=0x1e2c038, direction=<optimized out>, count=0, execute_once=<optimized out>) at executor/execMain.c:471
#9  0x000000000099e21a in AdvanceProducingPortal (portal=portal@entry=0x1e2c7c8, can_wait=can_wait@entry=0 '\000') at tcop/pquery.c:2668
#10 0x00000000009a19ef in PortalRun (portal=portal@entry=0x1e2c7c8, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', run_once=<optimized out>, dest=dest@entry=0x1d1d998, altdest=altdest@entry=0x1d1d998, completionTag=0x7ffe02273fa0 "") at tcop/pquery.c:1406
#11 0x00000000009a3e3b in exec_execute_message (max_rows=9223372036854775807, portal_name=0x1d1d588 "p_4_7e7_5_60c2eccf") at tcop/postgres.c:2995
#12 PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at tcop/postgres.c:5571
#13 0x00000000008e3368 in BackendRun (port=0x1c4bb60) at postmaster/postmaster.c:4982
#14 BackendStartup (port=0x1c4bb60) at postmaster/postmaster.c:4654
#15 ServerLoop () at postmaster/postmaster.c:1959
#16 0x00000000008e425c in PostmasterMain () at postmaster/postmaster.c:1567
#17 0x00000000004f77ad in main (argc=5, argv=0x1bfe710) at main/main.c:233

hope this helps..

beth-database commented 2 years ago

Hello @yazun Maybe some parallel process for dn6 are doing something or exited abnormally. You can ps ux|grep dn6_pid, then maybe some parallel process exists, they may have flags 'parallel worker for PID ', then you can gstack those process to see the stack, maybe we can know what happens when dn6 hangs. Or if there no parallel workers, then there maybe some error happened when parallel work, the error caused parallel worker exited abnormally, then you can look at dn6's log file to see whether some errors existed. in addition, please give me the definitions of dr3_ops_cs36_mv.dr3_common_export and the guc parameters if you have set, maybe i will try to reproduce the problem by your table definition and gucs. Look forward to your reply.

yazun commented 2 years ago

Definition of the table is quite simple happily:

 \d+ dr3_ops_cs36_mv.dr3_common_export
                                    Table "dr3_ops_cs36_mv.dr3_common_export"
  Column  |            Type             | Collation | Nullable | Default | Storage  | Stats target | Description
----------+-----------------------------+-----------+----------+---------+----------+--------------+-------------
 sourceid | bigint                      |           | not null |         | plain    |              |
 varitype | text                        |           | not null |         | extended |              |
 created  | timestamp without time zone |           |          | now()   | plain    |              |
Indexes:
    "dr3_common_export_pkey" PRIMARY KEY, btree (varitype, sourceid), tablespace "output_tablespace"
    "idx_id_dr3_common_export" btree (sourceid), tablespace "final_run_validation_tablespace"
Tablespace: "final_run_validation_tablespace"
Distribute By: SHARD(sourceid)
Location Nodes: ALL DATANODES
Options: autovacuum_analyze_threshold=1000

Am traveling today, will try to gstack in look in 3-4hrs. Thanks for checking this!

beth-database commented 2 years ago

Hello @yazun Have some gstack information ?

yazun commented 2 years ago

Hello @beth-database, sorry for the late response, I have been traveling:

ps ux|grep  13347
pgxzDR3  13347  9.0  0.0 62575336 8068 ?       Ssl  14:39   0:28 postgres: dr3_ops_cs36 surveys 192.168.168.149(10162) REMOTE SUBPLAN (coord12:39235) (D:datanode5:13937)
pgxzDR3  14005  2.7  0.0 62501788 7348 ?       Rs   14:40   0:06 postgres: bgworker: parallel worker for PID 13347
pgxzDR3  14006  2.7  0.0 62501788 7368 ?       Ss   14:40   0:06 postgres: bgworker: parallel worker for PID 13347
pgxzDR3  14007  2.8  0.0 62501788 7344 ?       Ss   14:40   0:06 postgres: bgworker: parallel worker for PID 13347
pgxzDR3  14008  2.7  0.0 62501788 7364 ?       Ss   14:40   0:06 postgres: bgworker: parallel worker for PID 13347
pgxzDR3  14009  2.7  0.0 62501788 7356 ?       Ss   14:40   0:06 postgres: bgworker: parallel worker for PID 13347
pgxzDR3  14010  2.7  0.0 62501788 7352 ?       Ss   14:40   0:06 postgres: bgworker: parallel worker for PID 13347

then gstacks

ps ux|grep  13347 | awk -- '{print $2}' | xargs -n1 gstack
Thread 10 (Thread 0x7f3a7795f700 (LWP 13994)):
#0  0x00007f3a83db59a3 in select () from /lib64/libc.so.6
#1  0x00000000007de43e in pg_usleep (microsec=100) at ../port/pgsleep.c:56
#2  pg_usleep (microsec=100) at ../port/pgsleep.c:47
#3  ParallelSenderThreadMain (arg=0x1a074d8) at pgxc/squeue/squeue.c:6882
#4  0x00007f3a8564dea5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f3a83dbe8dd in clone () from /lib64/libc.so.6
Thread 9 (Thread 0x7f3a7715e700 (LWP 13996)):
#0  0x00007f3a83db59a3 in select () from /lib64/libc.so.6
#1  0x00000000007de43e in pg_usleep (microsec=100) at ../port/pgsleep.c:56
#2  pg_usleep (microsec=100) at ../port/pgsleep.c:47
#3  ParallelSenderThreadMain (arg=0x1a07560) at pgxc/squeue/squeue.c:6882
#4  0x00007f3a8564dea5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f3a83dbe8dd in clone () from /lib64/libc.so.6
Thread 8 (Thread 0x7f3a7695d700 (LWP 13997)):
#0  0x00007f3a83db59a3 in select () from /lib64/libc.so.6
#1  0x00000000007de43e in pg_usleep (microsec=100) at ../port/pgsleep.c:56
#2  pg_usleep (microsec=100) at ../port/pgsleep.c:47
#3  ParallelSenderThreadMain (arg=0x1a075e8) at pgxc/squeue/squeue.c:6882
#4  0x00007f3a8564dea5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f3a83dbe8dd in clone () from /lib64/libc.so.6
Thread 7 (Thread 0x7f3a7615c700 (LWP 13998)):
#0  0x00007f3a83db59a3 in select () from /lib64/libc.so.6
#1  0x00000000007de43e in pg_usleep (microsec=100) at ../port/pgsleep.c:56
#2  pg_usleep (microsec=100) at ../port/pgsleep.c:47
#3  ParallelSenderThreadMain (arg=0x1a07670) at pgxc/squeue/squeue.c:6882
#4  0x00007f3a8564dea5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f3a83dbe8dd in clone () from /lib64/libc.so.6
Thread 6 (Thread 0x7f3a7595b700 (LWP 13999)):
#0  0x00007f3a83db59a3 in select () from /lib64/libc.so.6
#1  0x00000000007de43e in pg_usleep (microsec=100) at ../port/pgsleep.c:56
#2  pg_usleep (microsec=100) at ../port/pgsleep.c:47
#3  ParallelSenderThreadMain (arg=0x1a076f8) at pgxc/squeue/squeue.c:6882
#4  0x00007f3a8564dea5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f3a83dbe8dd in clone () from /lib64/libc.so.6
Thread 5 (Thread 0x7f3a7515a700 (LWP 14000)):
#0  0x00007f3a83db59a3 in select () from /lib64/libc.so.6
#1  0x00000000007de43e in pg_usleep (microsec=100) at ../port/pgsleep.c:56
#2  pg_usleep (microsec=100) at ../port/pgsleep.c:47
#3  ParallelSenderThreadMain (arg=0x1a07780) at pgxc/squeue/squeue.c:6882
#4  0x00007f3a8564dea5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f3a83dbe8dd in clone () from /lib64/libc.so.6
Thread 4 (Thread 0x7f3a74959700 (LWP 14001)):
#0  0x00007f3a83db59a3 in select () from /lib64/libc.so.6
#1  0x00000000007de43e in pg_usleep (microsec=100) at ../port/pgsleep.c:56
#2  pg_usleep (microsec=100) at ../port/pgsleep.c:47
#3  ParallelSenderThreadMain (arg=0x1a07808) at pgxc/squeue/squeue.c:6882
#4  0x00007f3a8564dea5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f3a83dbe8dd in clone () from /lib64/libc.so.6
Thread 3 (Thread 0x7f3a74158700 (LWP 14002)):
#0  0x00007f3a83db59a3 in select () from /lib64/libc.so.6
#1  0x00000000007de43e in pg_usleep (microsec=100) at ../port/pgsleep.c:56
#2  pg_usleep (microsec=100) at ../port/pgsleep.c:47
#3  ParallelSenderThreadMain (arg=0x1a07890) at pgxc/squeue/squeue.c:6882
#4  0x00007f3a8564dea5 in start_thread () from /lib64/libpthread.so.0
#5  0x00007f3a83dbe8dd in clone () from /lib64/libc.so.6
Thread 2 (Thread 0x7f3a73957700 (LWP 14003)):
#0  0x00007f3a856549dd in accept () from /lib64/libpthread.so.0
#1  0x00000000007e013f in ParallelConvertThreadMain (arg=0x1a06a88) at pgxc/squeue/squeue.c:6793
#2  0x00007f3a8564dea5 in start_thread () from /lib64/libpthread.so.0
#3  0x00007f3a83dbe8dd in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x7f3a85a5d840 (LWP 13347)):
#0  0x00007f3a83db59a3 in select () from /lib64/libc.so.6
#1  0x00000000007e76bc in pg_usleep (microsec=1000) at ../port/pgsleep.c:56
#2  pg_usleep (microsec=1000) at ../port/pgsleep.c:47
#3  WaitForParallelWorkerDone () at pgxc/squeue/squeue.c:8883
#4  0x0000000000742482 in ExecGather (pstate=0x1a087e8) at executor/nodeGather.c:273
#5  0x00000000007233ab in ExecProcNodeInstr (node=0x1a087e8) at executor/execProcnode.c:553
#6  0x0000000000718538 in ExecProcNode (node=0x1a087e8) at executor/../../../src/include/executor/executor.h:275
#7  ExecutePlan (execute_once=<optimized out>, dest=0x1a202e8, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=<optimized out>, estate=<optimized out>) at executor/execMain.c:2061
#8  standard_ExecutorRun (queryDesc=0x1a06908, direction=<optimized out>, count=0, execute_once=<optimized out>) at executor/execMain.c:471
#9  0x000000000099e21a in AdvanceProducingPortal (portal=portal@entry=0x1a048a8, can_wait=can_wait@entry=0 '\000') at tcop/pquery.c:2668
#10 0x00000000009a19ef in PortalRun (portal=portal@entry=0x1a048a8, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', run_once=<optimized out>, dest=dest@entry=0x18f1a08, altdest=altdest@entry=0x18f1a08, completionTag=0x7ffd2025a340 "") at tcop/pquery.c:1406
#11 0x00000000009a3e3b in exec_execute_message (max_rows=9223372036854775807, portal_name=0x18f15f8 "p_4_9943_3_5dc9e2be") at tcop/postgres.c:2995
#12 PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at tcop/postgres.c:5571
#13 0x00000000008e3368 in BackendRun (port=0x180dd60) at postmaster/postmaster.c:4982
#14 BackendStartup (port=0x180dd60) at postmaster/postmaster.c:4654
#15 ServerLoop () at postmaster/postmaster.c:1959
#16 0x00000000008e425c in PostmasterMain () at postmaster/postmaster.c:1567
#17 0x00000000004f77ad in main (argc=5, argv=0x17d6710) at main/main.c:233
#0  0x00007f3a83db5983 in __select_nocancel () from /lib64/libc.so.6
#1  0x0000000000bcd93d in pg_usleep (microsec=50) at ../port/pgsleep.c:56
#2  pg_usleep (microsec=50) at ../port/pgsleep.c:47
#3  ParallelSendDataRow.constprop.0 (buf=buf@entry=0x7f3a77d21398, data=0x1a589a0 "", len=54, consumerIdx=<optimized out>, control=<optimized out>) at pgxc/squeue/squeue.c:7888
#4  0x00000000007e6869 in PumpTupleStoreToBuffer (control=0x18d8988, consumerIdx=4, tmpcxt=<optimized out>, tuplestore=0x1912a48, tmpslot=0x1a779e8, buf=0x7f3a77d21398) at pgxc/squeue/squeue.c:8700
#5  ParallelSendShutdownReceiver (self=<optimized out>) at pgxc/squeue/squeue.c:7700
#6  0x0000000000718827 in standard_ExecutorRun (queryDesc=0x1911e68, direction=<optimized out>, count=0, execute_once=<optimized out>) at executor/execMain.c:486
#7  0x0000000000722c7b in ExecutorRun (direction=ForwardScanDirection, execute_once=1 '\001', count=0, queryDesc=0x1911e68) at executor/execMain.c:514
#8  ParallelQueryMain () at executor/execParallel.c:1619
#9  0x000000000058f288 in ParallelWorkerMain () at access/transam/parallel.c:1182
#10 0x00000000008d99d8 in StartBackgroundWorker () at postmaster/bgworker.c:919
#11 0x00000000008f2114 in do_start_bgworker (rw=<optimized out>) at postmaster/postmaster.c:6375
#12 maybe_start_bgworkers () at postmaster/postmaster.c:6589
#13 0x00000000008f22b5 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster/postmaster.c:5719
#14 <signal handler called>
#15 0x00007f3a83db5983 in __select_nocancel () from /lib64/libc.so.6
#16 0x00000000008e22c6 in ServerLoop () at postmaster/postmaster.c:1923
#17 0x00000000008e425c in PostmasterMain () at postmaster/postmaster.c:1567
#18 0x00000000004f77ad in main (argc=5, argv=0x17d6710) at main/main.c:233
#0  0x00007f3a83db5983 in __select_nocancel () from /lib64/libc.so.6
#1  0x0000000000bcd93d in pg_usleep (microsec=50) at ../port/pgsleep.c:56
#2  pg_usleep (microsec=50) at ../port/pgsleep.c:47
#3  ParallelSendDataRow.constprop.0 (buf=buf@entry=0x7f3a77ca0d38, data=0x1a7d430 "", len=62, consumerIdx=<optimized out>, control=<optimized out>) at pgxc/squeue/squeue.c:7888
#4  0x00000000007e6869 in PumpTupleStoreToBuffer (control=0x18d8988, consumerIdx=8, tmpcxt=<optimized out>, tuplestore=0x1a77028, tmpslot=0x1a779b8, buf=0x7f3a77ca0d38) at pgxc/squeue/squeue.c:8700
#5  ParallelSendShutdownReceiver (self=<optimized out>) at pgxc/squeue/squeue.c:7700
#6  0x0000000000718827 in standard_ExecutorRun (queryDesc=0x1911e68, direction=<optimized out>, count=0, execute_once=<optimized out>) at executor/execMain.c:486
#7  0x0000000000722c7b in ExecutorRun (direction=ForwardScanDirection, execute_once=1 '\001', count=0, queryDesc=0x1911e68) at executor/execMain.c:514
#8  ParallelQueryMain () at executor/execParallel.c:1619
#9  0x000000000058f288 in ParallelWorkerMain () at access/transam/parallel.c:1182
#10 0x00000000008d99d8 in StartBackgroundWorker () at postmaster/bgworker.c:919
#11 0x00000000008f2114 in do_start_bgworker (rw=<optimized out>) at postmaster/postmaster.c:6375
#12 maybe_start_bgworkers () at postmaster/postmaster.c:6589
#13 0x00000000008f22b5 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster/postmaster.c:5719
#14 <signal handler called>
#15 0x00007f3a83db5983 in __select_nocancel () from /lib64/libc.so.6
#16 0x00000000008e22c6 in ServerLoop () at postmaster/postmaster.c:1923
#17 0x00000000008e425c in PostmasterMain () at postmaster/postmaster.c:1567
#18 0x00000000004f77ad in main (argc=5, argv=0x17d6710) at main/main.c:233
#0  0x00007f3a83db5983 in __select_nocancel () from /lib64/libc.so.6
#1  0x0000000000bcd93d in pg_usleep (microsec=50) at ../port/pgsleep.c:56
#2  pg_usleep (microsec=50) at ../port/pgsleep.c:47
#3  ParallelSendDataRow.constprop.0 (buf=buf@entry=0x7f3a77ba0078, data=0x1a79688 "", len=62, consumerIdx=<optimized out>, control=<optimized out>) at pgxc/squeue/squeue.c:7888
#4  0x00000000007e6869 in PumpTupleStoreToBuffer (control=0x18d8988, consumerIdx=4, tmpcxt=<optimized out>, tuplestore=0x1912a48, tmpslot=0x1a77a08, buf=0x7f3a77ba0078) at pgxc/squeue/squeue.c:8700
#5  ParallelSendShutdownReceiver (self=<optimized out>) at pgxc/squeue/squeue.c:7700
#6  0x0000000000718827 in standard_ExecutorRun (queryDesc=0x1911e68, direction=<optimized out>, count=0, execute_once=<optimized out>) at executor/execMain.c:486
#7  0x0000000000722c7b in ExecutorRun (direction=ForwardScanDirection, execute_once=1 '\001', count=0, queryDesc=0x1911e68) at executor/execMain.c:514
#8  ParallelQueryMain () at executor/execParallel.c:1619
#9  0x000000000058f288 in ParallelWorkerMain () at access/transam/parallel.c:1182
#10 0x00000000008d99d8 in StartBackgroundWorker () at postmaster/bgworker.c:919
#11 0x00000000008f2114 in do_start_bgworker (rw=<optimized out>) at postmaster/postmaster.c:6375
#12 maybe_start_bgworkers () at postmaster/postmaster.c:6589
#13 0x00000000008f22b5 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster/postmaster.c:5719
#14 <signal handler called>
#15 0x00007f3a83db5983 in __select_nocancel () from /lib64/libc.so.6
#16 0x00000000008e22c6 in ServerLoop () at postmaster/postmaster.c:1923
#17 0x00000000008e425c in PostmasterMain () at postmaster/postmaster.c:1567
#18 0x00000000004f77ad in main (argc=5, argv=0x17d6710) at main/main.c:233
#0  0x00007f3a83db5983 in __select_nocancel () from /lib64/libc.so.6
#1  0x0000000000bcd93d in pg_usleep (microsec=50) at ../port/pgsleep.c:56
#2  pg_usleep (microsec=50) at ../port/pgsleep.c:47
#3  ParallelSendDataRow.constprop.0 (buf=buf@entry=0x7f3a77b1fa18, data=0x1a7b580 "", len=62, consumerIdx=<optimized out>, control=<optimized out>) at pgxc/squeue/squeue.c:7888
#4  0x00000000007e6869 in PumpTupleStoreToBuffer (control=0x18d8988, consumerIdx=8, tmpcxt=<optimized out>, tuplestore=0x1a77098, tmpslot=0x1a77a28, buf=0x7f3a77b1fa18) at pgxc/squeue/squeue.c:8700
#5  ParallelSendShutdownReceiver (self=<optimized out>) at pgxc/squeue/squeue.c:7700
#6  0x0000000000718827 in standard_ExecutorRun (queryDesc=0x1911e68, direction=<optimized out>, count=0, execute_once=<optimized out>) at executor/execMain.c:486
#7  0x0000000000722c7b in ExecutorRun (direction=ForwardScanDirection, execute_once=1 '\001', count=0, queryDesc=0x1911e68) at executor/execMain.c:514
#8  ParallelQueryMain () at executor/execParallel.c:1619
#9  0x000000000058f288 in ParallelWorkerMain () at access/transam/parallel.c:1182
#10 0x00000000008d99d8 in StartBackgroundWorker () at postmaster/bgworker.c:919
#11 0x00000000008f2114 in do_start_bgworker (rw=<optimized out>) at postmaster/postmaster.c:6375
#12 maybe_start_bgworkers () at postmaster/postmaster.c:6589
#13 0x00000000008f22b5 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster/postmaster.c:5719
#14 <signal handler called>
#15 0x00007f3a83db5983 in __select_nocancel () from /lib64/libc.so.6
#16 0x00000000008e22c6 in ServerLoop () at postmaster/postmaster.c:1923
#17 0x00000000008e425c in PostmasterMain () at postmaster/postmaster.c:1567
#18 0x00000000004f77ad in main (argc=5, argv=0x17d6710) at main/main.c:233
#0  0x00007f3a83db5983 in __select_nocancel () from /lib64/libc.so.6
#1  0x0000000000bcd93d in pg_usleep (microsec=50) at ../port/pgsleep.c:56
#2  pg_usleep (microsec=50) at ../port/pgsleep.c:47
#3  ParallelSendDataRow.constprop.0 (buf=buf@entry=0x7f3a77a1ed58, data=0x1a64af0 "", len=54, consumerIdx=<optimized out>, control=<optimized out>) at pgxc/squeue/squeue.c:7888
#4  0x00000000007e6869 in PumpTupleStoreToBuffer (control=0x18d8988, consumerIdx=4, tmpcxt=<optimized out>, tuplestore=0x1912a48, tmpslot=0x1a77a18, buf=0x7f3a77a1ed58) at pgxc/squeue/squeue.c:8700
#5  ParallelSendShutdownReceiver (self=<optimized out>) at pgxc/squeue/squeue.c:7700
#6  0x0000000000718827 in standard_ExecutorRun (queryDesc=0x1911e68, direction=<optimized out>, count=0, execute_once=<optimized out>) at executor/execMain.c:486
#7  0x0000000000722c7b in ExecutorRun (direction=ForwardScanDirection, execute_once=1 '\001', count=0, queryDesc=0x1911e68) at executor/execMain.c:514
#8  ParallelQueryMain () at executor/execParallel.c:1619
#9  0x000000000058f288 in ParallelWorkerMain () at access/transam/parallel.c:1182
#10 0x00000000008d99d8 in StartBackgroundWorker () at postmaster/bgworker.c:919
#11 0x00000000008f2114 in do_start_bgworker (rw=<optimized out>) at postmaster/postmaster.c:6375
#12 maybe_start_bgworkers () at postmaster/postmaster.c:6589
#13 0x00000000008f22b5 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster/postmaster.c:5719
#14 <signal handler called>
#15 0x00007f3a83db5983 in __select_nocancel () from /lib64/libc.so.6
#16 0x00000000008e22c6 in ServerLoop () at postmaster/postmaster.c:1923
#17 0x00000000008e425c in PostmasterMain () at postmaster/postmaster.c:1567
#18 0x00000000004f77ad in main (argc=5, argv=0x17d6710) at main/main.c:233
#0  0x00007f3a83db5983 in __select_nocancel () from /lib64/libc.so.6
#1  0x0000000000bcd93d in pg_usleep (microsec=50) at ../port/pgsleep.c:56
#2  pg_usleep (microsec=50) at ../port/pgsleep.c:47
#3  ParallelSendDataRow.constprop.0 (buf=buf@entry=0x7f3a7795e3c8, data=0x1a7e050 "", len=62, consumerIdx=<optimized out>, control=<optimized out>) at pgxc/squeue/squeue.c:7888
#4  0x00000000007e6869 in PumpTupleStoreToBuffer (control=0x18d8988, consumerIdx=4, tmpcxt=<optimized out>, tuplestore=0x1912a48, tmpslot=0x1a77a08, buf=0x7f3a7795e3c8) at pgxc/squeue/squeue.c:8700
#5  ParallelSendShutdownReceiver (self=<optimized out>) at pgxc/squeue/squeue.c:7700
#6  0x0000000000718827 in standard_ExecutorRun (queryDesc=0x1911e68, direction=<optimized out>, count=0, execute_once=<optimized out>) at executor/execMain.c:486
#7  0x0000000000722c7b in ExecutorRun (direction=ForwardScanDirection, execute_once=1 '\001', count=0, queryDesc=0x1911e68) at executor/execMain.c:514
#8  ParallelQueryMain () at executor/execParallel.c:1619
#9  0x000000000058f288 in ParallelWorkerMain () at access/transam/parallel.c:1182
#10 0x00000000008d99d8 in StartBackgroundWorker () at postmaster/bgworker.c:919
#11 0x00000000008f2114 in do_start_bgworker (rw=<optimized out>) at postmaster/postmaster.c:6375
#12 maybe_start_bgworkers () at postmaster/postmaster.c:6589
#13 0x00000000008f22b5 in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster/postmaster.c:5719
#14 <signal handler called>
#15 0x00007f3a83db5983 in __select_nocancel () from /lib64/libc.so.6
#16 0x00000000008e22c6 in ServerLoop () at postmaster/postmaster.c:1923
#17 0x00000000008e425c in PostmasterMain () at postmaster/postmaster.c:1567
#18 0x00000000004f77ad in main (argc=5, argv=0x17d6710) at main/main.c:233
yazun commented 2 years ago

(note that this time it launched the parallel worker on dn5, not dn6)

beth-database commented 2 years ago

Hi, @yazun. I'm sorry to interrupt your trip and thanks for your reply. According the stack, there are some doubts: 1、About the guc paramemters of sender_thread_batch_size and sender_thread_buffer_size. Maybe sender_thread_batch_size is too close to sender_thread_buffer_size, or maybe sender_thread_batch_size>=sender_thread_buffer_size. So please show sender_thread_batch_size and sender_thread_buffer_size in the coordinator that executed the hang sql. 2、Maybe an error occurred while actually sending data, to confirm this guess, please gdb to the hang process(in the example it is 13347: 'dr3_ops_cs36 surveys 192.168.168.149(10162) REMOTE SUBPLAN (coord12:39235) (D:datanode5:13937)'), and then insert a breakpoint at squeue.c:6978, when run breakpoint, please get node information and buffer information by 'p node' and 'p buffer'. There are some threads of ParallelSenderThreadMain, so please run the breakpoint in all the ParallelSenderThreadMain threads.

yazun commented 2 years ago

Hello @beth-database, No interruptions, happily trips are done for now! 1.

sender_thread_batch_size               | 64                                                          | batch size of senders in datapump
sender_thread_buffer_size              | 64                                                          | buffer size of senders in datapump
sender_thread_num                      | 8                                                           | Number of maximum senders in datapump

Checking the 2.

yazun commented 2 years ago
  1. This time we ended up on dn1:
    
    Attaching to process 9614
    [New LWP 9755]
    [New LWP 9756]
    [New LWP 9757]
    [New LWP 9758]
    [New LWP 9759]
    [New LWP 9760]
    [New LWP 9761]
    [New LWP 9762]
    [New LWP 9763]
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib64/libthread_db.so.1".
    0x00007fadc832a9a3 in select () from /lib64/libc.so.6
    Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.26-23.el7.x86_64 glibc-2.17-307.el7.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-46.el7.x86_64 libcom_err-1.42.9-17.el7.x86_64 libselinux-2.5-15.el7.x86_64 libxml2-2.9.1-6.el7.4.x86_64 nspr-4.21.0-1.el7.x86_64 nss-3.44.0-7.el7_7.x86_64 nss-softokn-freebl-3.44.0-8.el7_7.x86_64 nss-util-3.44.0-4.el7_7.x86_64 openldap-2.4.44-21.el7_6.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-18.el7.x86_64
    (gdb) b squeue.c:6978
    Breakpoint 1 at 0x7ddfd6: file pgxc/squeue/squeue.c, line 6978.
    (gdb) c
    Continuing.
    [Switching to Thread 0x7fadb9ed1700 (LWP 9759)]

Thread 6 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0db8, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) (gdb) p node $1 = (ParallelSendNodeControl ) 0x22e0db8 (gdb) p node $2 = {nodeId = 8, sock = 16, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 0, buffer = 0x22e11b8, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4646670} (gdb) p *buffer $3 = {nodeId = 8, parallelWorkerNum = 0, tuples_put = 2748, tuples_get = 1, ntuples = 996, fast_send = 995, normal_send = 1, send_times = 0, no_data = 0, send_data_len = 0, write_data_len = 65526, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 0 '\000', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 65526, bufTail = 0, bufBorder = 65526, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbbf567c4 "D"}

..

(gdb) disp buffer 1: buffer = {nodeId = 4, parallelWorkerNum = 1, tuples_put = 6183, tuples_get = 2, ntuples = 1010, fast_send = 1009, normal_send = 1, send_times = 0, no_data = 0, send_data_len = 0, write_data_len = 65503, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 0 '\000', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 65503, bufTail = 0, bufBorder = 65503, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbbfd6e24 "D"} (gdb) disp node 2: node = {nodeId = 4, sock = 20, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 1, buffer = 0x22e1078, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4647815} (gdb) c Continuing. [Switching to Thread 0x7fadbb6d4700 (LWP 9756)]

Thread 3 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0c08, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 2, parallelWorkerNum = 1, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4648769, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbbfb6c8c ""} 2: node = {nodeId = 2, sock = 23, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 1, buffer = 0x22e0fd8, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650440} (gdb) c Continuing. [Switching to Thread 0x7fadb9ed1700 (LWP 9759)]

Thread 6 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0db8, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 8, parallelWorkerNum = 2, tuples_put = 2585, tuples_get = 1, ntuples = 998, fast_send = 997, normal_send = 1, send_times = 0, no_data = 0, send_data_len = 0, write_data_len = 65489, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 0 '\000', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 65489, bufTail = 0, bufBorder = 65489, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc0d7ae4 "D"} 2: node = {nodeId = 8, sock = 16, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 2, buffer = 0x22e11b8, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4646670} (gdb) c Continuing. [Switching to Thread 0x7fadba6d2700 (LWP 9758)]

Thread 5 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0d28, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 6, parallelWorkerNum = 1, tuples_put = 0, tuples_get = 0, ntuples = 203, fast_send = 203, normal_send = 0, send_times = 1, no_data = 4649096, send_data_len = 12784, write_data_len = 12784, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 12784, bufTail = 12784, bufBorder = 12784, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbbff6fbc "D"} 2: node = {nodeId = 6, sock = 17, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 1, buffer = 0x22e1118, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650769} (gdb) c Continuing. [Switching to Thread 0x7fadbbed5700 (LWP 9755)]

Thread 2 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0b78, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 0, parallelWorkerNum = 2, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4650216, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc057484 ""} 2: node = {nodeId = 0, sock = 21, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 2, buffer = 0x20e6a28, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4651884} (gdb) c Continuing. [Switching to Thread 0x7fadbb6d4700 (LWP 9756)]

Thread 3 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0c08, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 2, parallelWorkerNum = 2, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4648770, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc07761c ""} 2: node = {nodeId = 2, sock = 23, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 2, buffer = 0x22e0fd8, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650440} (gdb) c Continuing. [Switching to Thread 0x7fadbaed3700 (LWP 9757)]

Thread 4 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0c98, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 4, parallelWorkerNum = 2, tuples_put = 6517, tuples_get = 2, ntuples = 1013, fast_send = 1012, normal_send = 1, send_times = 0, no_data = 0, send_data_len = 0, write_data_len = 65515, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 0 '\000', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 65515, bufTail = 0, bufBorder = 65515, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc0977b4 "D"} 2: node = {nodeId = 4, sock = 20, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 2, buffer = 0x22e1078, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4647815} (gdb) c Continuing. [Switching to Thread 0x7fadbbed5700 (LWP 9755)]

Thread 2 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0b78, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 0, parallelWorkerNum = 3, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4650212, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc117e14 ""} 2: node = {nodeId = 0, sock = 21, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 3, buffer = 0x20e6a28, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4651884} (gdb) c Continuing.

Thread 3 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0c08, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 2, parallelWorkerNum = 3, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4648766, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc137fac ""} 2: node = {nodeId = 2, sock = 23, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 3, buffer = 0x22e0fd8, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650440} (gdb) c Continuing. [Switching to Thread 0x7fadbaed3700 (LWP 9757)]

Thread 4 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0c98, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 4, parallelWorkerNum = 3, tuples_put = 3952, tuples_get = 2, ntuples = 1006, fast_send = 1005, normal_send = 1, send_times = 0, no_data = 0, send_data_len = 0, write_data_len = 65528, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 0 '\000', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 65528, bufTail = 0, bufBorder = 65528, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc158144 "D"} 2: node = {nodeId = 4, sock = 20, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 3, buffer = 0x22e1078, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4647815} (gdb) c Continuing. [Switching to Thread 0x7fadbb6d4700 (LWP 9756)]

Thread 3 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0c08, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 2, parallelWorkerNum = 4, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4648768, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc1f893c ""} 2: node = {nodeId = 2, sock = 23, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 4, buffer = 0x22e0fd8, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650440} (gdb) c Continuing. [Switching to Thread 0x7fadbbed5700 (LWP 9755)]

Thread 2 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0b78, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 0, parallelWorkerNum = 4, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4650214, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc1d87a4 ""} 2: node = {nodeId = 0, sock = 21, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 4, buffer = 0x20e6a28, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4651884} (gdb) c Continuing. [Switching to Thread 0x7fadba6d2700 (LWP 9758)]

Thread 5 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0d28, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 6, parallelWorkerNum = 2, tuples_put = 0, tuples_get = 0, ntuples = 194, fast_send = 194, normal_send = 0, send_times = 1, no_data = 4649098, send_data_len = 12063, write_data_len = 12063, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 12063, bufTail = 12063, bufBorder = 12063, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc0b794c "D"} 2: node = {nodeId = 6, sock = 17, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 2, buffer = 0x22e1118, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650769} (gdb) c Continuing. [Switching to Thread 0x7fadbaed3700 (LWP 9757)]

Thread 4 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0c98, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 4, parallelWorkerNum = 4, tuples_put = 6447, tuples_get = 2, ntuples = 1014, fast_send = 1013, normal_send = 1, send_times = 0, no_data = 0, send_data_len = 0, write_data_len = 65532, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 0 '\000', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 65532, bufTail = 0, bufBorder = 65532, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc218ad4 "D"} 2: node = {nodeId = 4, sock = 20, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 4, buffer = 0x22e1078, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4647815} (gdb) Continuing. [Switching to Thread 0x7fadbb6d4700 (LWP 9756)]

Thread 3 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0c08, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 2, parallelWorkerNum = 5, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4648769, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc2b92cc ""} 2: node = {nodeId = 2, sock = 23, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 5, buffer = 0x22e0fd8, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650440} (gdb) c Continuing. [Switching to Thread 0x7fadbbed5700 (LWP 9755)]

Thread 2 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0b78, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 0, parallelWorkerNum = 5, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4650214, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc299134 ""} 2: node = {nodeId = 0, sock = 21, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 5, buffer = 0x20e6a28, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4651884} (gdb) c Continuing. [Switching to Thread 0x7fadbaed3700 (LWP 9757)]

Thread 4 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0c98, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 4, parallelWorkerNum = 5, tuples_put = 5884, tuples_get = 1, ntuples = 1010, fast_send = 1009, normal_send = 1, send_times = 0, no_data = 0, send_data_len = 0, write_data_len = 65519, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 0 '\000', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 65519, bufTail = 0, bufBorder = 65519, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc2d9464 "D"} 2: node = {nodeId = 4, sock = 20, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 5, buffer = 0x22e1078, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4647815} (gdb) c Continuing. [Switching to Thread 0x7fadbb6d4700 (LWP 9756)]

Thread 3 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0c50, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 3, parallelWorkerNum = 0, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4648767, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbbf063c8 ""} 2: node = {nodeId = 3, sock = 22, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 0, buffer = 0x22e1028, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650440} (gdb) c Continuing. [Switching to Thread 0x7fadba6d2700 (LWP 9758)]

Thread 5 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0d28, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 6, parallelWorkerNum = 3, tuples_put = 0, tuples_get = 0, ntuples = 99, fast_send = 99, normal_send = 0, send_times = 1, no_data = 4649094, send_data_len = 6002, write_data_len = 6002, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 6002, bufTail = 6002, bufBorder = 6002, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc1782dc "D"} 2: node = {nodeId = 6, sock = 17, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 3, buffer = 0x22e1118, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650769} (gdb) c Continuing. [Switching to Thread 0x7fadbbed5700 (LWP 9755)]

Thread 2 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0bc0, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 1, parallelWorkerNum = 0, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4650213, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbbee6230 ""} 2: node = {nodeId = 1, sock = 15, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 0, buffer = 0x22e0f88, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4651885} (gdb) c Continuing. [Switching to Thread 0x7fadbb6d4700 (LWP 9756)]

Thread 3 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0c50, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 3, parallelWorkerNum = 1, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4648769, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbbfc6d58 ""} 2: node = {nodeId = 3, sock = 22, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 1, buffer = 0x22e1028, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650440} (gdb) Continuing. [Switching to Thread 0x7fadbaed3700 (LWP 9757)]

Thread 4 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0ce0, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 5, parallelWorkerNum = 0, tuples_put = 0, tuples_get = 0, ntuples = 237, fast_send = 237, normal_send = 0, send_times = 1, no_data = 4646142, send_data_len = 15882, write_data_len = 15882, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 15882, bufTail = 15882, bufBorder = 15882, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbbf26560 "D"} 2: node = {nodeId = 5, sock = 19, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 0, buffer = 0x22e10c8, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4647815} (gdb) c Continuing.

Thread 2 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0bc0, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 1, parallelWorkerNum = 1, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4650214, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbbfa6bc0 ""} 2: node = {nodeId = 1, sock = 15, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 1, buffer = 0x22e0f88, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4651885} (gdb) c Continuing. [Switching to Thread 0x7fadba6d2700 (LWP 9758)]

Thread 5 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0d28, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 6, parallelWorkerNum = 4, tuples_put = 0, tuples_get = 0, ntuples = 191, fast_send = 191, normal_send = 0, send_times = 1, no_data = 4649096, send_data_len = 11945, write_data_len = 11945, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 11945, bufTail = 11945, bufBorder = 11945, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc238c6c "D"} 2: node = {nodeId = 6, sock = 17, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 4, buffer = 0x22e1118, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650769} (gdb) c Continuing. [Switching to Thread 0x7fadbb6d4700 (LWP 9756)]

Thread 3 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0c50, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 3, parallelWorkerNum = 2, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4648770, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc0876e8 ""} 2: node = {nodeId = 3, sock = 22, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 2, buffer = 0x22e1028, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650440} (gdb) c Continuing. [Switching to Thread 0x7fadbaed3700 (LWP 9757)]

Thread 4 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0ce0, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 5, parallelWorkerNum = 1, tuples_put = 0, tuples_get = 0, ntuples = 259, fast_send = 259, normal_send = 0, send_times = 1, no_data = 4646143, send_data_len = 17438, write_data_len = 17438, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 17438, bufTail = 17438, bufBorder = 17438, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbbfe6ef0 "D"} 2: node = {nodeId = 5, sock = 19, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 1, buffer = 0x22e10c8, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4647815} (gdb) c Continuing. [Switching to Thread 0x7fadbbed5700 (LWP 9755)]

Thread 2 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0bc0, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 1, parallelWorkerNum = 2, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4650216, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc067550 ""} 2: node = {nodeId = 1, sock = 15, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 2, buffer = 0x22e0f88, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4651885} (gdb) c Continuing. [Switching to Thread 0x7fadbb6d4700 (LWP 9756)]

Thread 3 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0c50, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 3, parallelWorkerNum = 3, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4648766, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc148078 ""} 2: node = {nodeId = 3, sock = 22, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 3, buffer = 0x22e1028, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650440} (gdb) c Continuing. [Switching to Thread 0x7fadba6d2700 (LWP 9758)]

Thread 5 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0d28, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 6, parallelWorkerNum = 5, tuples_put = 0, tuples_get = 0, ntuples = 211, fast_send = 211, normal_send = 0, send_times = 1, no_data = 4649096, send_data_len = 13233, write_data_len = 13233, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 13233, bufTail = 13233, bufBorder = 13233, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc2f95fc "D"} 2: node = {nodeId = 6, sock = 17, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 5, buffer = 0x22e1118, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650769} (gdb) c Continuing. [Switching to Thread 0x7fadbbed5700 (LWP 9755)]

Thread 2 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0bc0, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 1, parallelWorkerNum = 3, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4650212, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc127ee0 ""} 2: node = {nodeId = 1, sock = 15, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 3, buffer = 0x22e0f88, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4651885} (gdb) c Continuing. [Switching to Thread 0x7fadbaed3700 (LWP 9757)]

Thread 4 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0ce0, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 5, parallelWorkerNum = 2, tuples_put = 0, tuples_get = 0, ntuples = 276, fast_send = 276, normal_send = 0, send_times = 1, no_data = 4646144, send_data_len = 18590, write_data_len = 18590, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 18590, bufTail = 18590, bufBorder = 18590, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc0a7880 "D"} 2: node = {nodeId = 5, sock = 19, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 2, buffer = 0x22e10c8, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4647815} (gdb) c Continuing. [Switching to Thread 0x7fadbb6d4700 (LWP 9756)]

Thread 3 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0c50, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 3, parallelWorkerNum = 4, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4648768, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc208a08 ""} 2: node = {nodeId = 3, sock = 22, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 4, buffer = 0x22e1028, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650440} (gdb) c Continuing. [Switching to Thread 0x7fadba6d2700 (LWP 9758)]

Thread 5 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0d70, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 7, parallelWorkerNum = 0, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4649096, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbbf466f8 ""} 2: node = {nodeId = 7, sock = 14, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 0, buffer = 0x22e1168, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650769} (gdb) c Continuing. [Switching to Thread 0x7fadbbed5700 (LWP 9755)]

Thread 2 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0bc0, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 1, parallelWorkerNum = 4, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4650214, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc1e8870 ""} 2: node = {nodeId = 1, sock = 15, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 4, buffer = 0x22e0f88, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4651885} (gdb) c Continuing. [Switching to Thread 0x7fadbb6d4700 (LWP 9756)]

Thread 3 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0c50, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 3, parallelWorkerNum = 5, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4648769, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc2c9398 ""} 2: node = {nodeId = 3, sock = 22, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 5, buffer = 0x22e1028, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650440} (gdb) c Continuing. [Switching to Thread 0x7fadbaed3700 (LWP 9757)]

Thread 4 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0ce0, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 5, parallelWorkerNum = 3, tuples_put = 0, tuples_get = 0, ntuples = 274, fast_send = 274, normal_send = 0, send_times = 1, no_data = 4646141, send_data_len = 18430, write_data_len = 18430, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 18430, bufTail = 18430, bufBorder = 18430, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc168210 "D"} 2: node = {nodeId = 5, sock = 19, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 3, buffer = 0x22e10c8, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4647815} (gdb) c Continuing.

Thread 2 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0bc0, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 1, parallelWorkerNum = 5, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4650214, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc2a9200 ""} 2: node = {nodeId = 1, sock = 15, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 5, buffer = 0x22e0f88, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4651885} (gdb) c Continuing. [Switching to Thread 0x7fadba6d2700 (LWP 9758)]

Thread 5 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0d70, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 7, parallelWorkerNum = 1, tuples_put = 0, tuples_get = 0, ntuples = 2, fast_send = 2, normal_send = 0, send_times = 1, no_data = 4649096, send_data_len = 139, write_data_len = 139, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 139, bufTail = 139, bufBorder = 139, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc007088 "D"} 2: node = {nodeId = 7, sock = 14, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 1, buffer = 0x22e1168, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650769} (gdb) c Continuing. [Switching to Thread 0x7fadbaed3700 (LWP 9757)]

Thread 4 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0ce0, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 5, parallelWorkerNum = 4, tuples_put = 0, tuples_get = 0, ntuples = 312, fast_send = 312, normal_send = 0, send_times = 1, no_data = 4646143, send_data_len = 20972, write_data_len = 20972, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 20972, bufTail = 20972, bufBorder = 20972, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc228ba0 "D"} 2: node = {nodeId = 5, sock = 19, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 4, buffer = 0x22e10c8, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4647815} (gdb) c Continuing. [Switching to Thread 0x7fadbb6d4700 (LWP 9756)]

Thread 3 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0c08, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 2, parallelWorkerNum = 0, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4648768, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbbef62fc ""} 2: node = {nodeId = 2, sock = 23, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 0, buffer = 0x22e0fd8, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650441} (gdb) c Continuing. [Switching to Thread 0x7fadbbed5700 (LWP 9755)]

Thread 2 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0b78, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 0, parallelWorkerNum = 0, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4650214, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbbed6164 ""} 2: node = {nodeId = 0, sock = 21, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 0, buffer = 0x20e6a28, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4651885} (gdb) c Continuing. [Switching to Thread 0x7fadbaed3700 (LWP 9757)]

Thread 4 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0ce0, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 5, parallelWorkerNum = 5, tuples_put = 0, tuples_get = 0, ntuples = 253, fast_send = 253, normal_send = 0, send_times = 1, no_data = 4646143, send_data_len = 16985, write_data_len = 16985, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 16985, bufTail = 16985, bufBorder = 16985, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc2e9530 "D"} 2: node = {nodeId = 5, sock = 19, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 5, buffer = 0x22e10c8, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4647815} (gdb) c Continuing. [Switching to Thread 0x7fadba6d2700 (LWP 9758)]

Thread 5 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0d70, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 7, parallelWorkerNum = 2, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4649099, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc0c7a18 ""} 2: node = {nodeId = 7, sock = 14, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 2, buffer = 0x22e1168, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650769} (gdb) c Continuing. [Switching to Thread 0x7fadb9ed1700 (LWP 9759)]

Thread 6 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0db8, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 8, parallelWorkerNum = 3, tuples_put = 2564, tuples_get = 1, ntuples = 993, fast_send = 992, normal_send = 1, send_times = 0, no_data = 0, send_data_len = 0, write_data_len = 65488, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 0 '\000', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 65488, bufTail = 0, bufBorder = 65488, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc198474 "D"} 2: node = {nodeId = 8, sock = 16, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 3, buffer = 0x22e11b8, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4646670} (gdb) c Continuing. [Switching to Thread 0x7fadbb6d4700 (LWP 9756)]

Thread 3 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0c08, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 2, parallelWorkerNum = 1, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4648770, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbbfb6c8c ""} 2: node = {nodeId = 2, sock = 23, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 1, buffer = 0x22e0fd8, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650441} (gdb) c Continuing. [Switching to Thread 0x7fadbbed5700 (LWP 9755)]

Thread 2 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0b78, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 0, parallelWorkerNum = 1, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4650215, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbbf96af4 ""} 2: node = {nodeId = 0, sock = 21, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 1, buffer = 0x20e6a28, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4651885} (gdb) c Continuing. [Switching to Thread 0x7fadbaed3700 (LWP 9757)]

Thread 4 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0c98, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 4, parallelWorkerNum = 0, tuples_put = 3845, tuples_get = 2, ntuples = 1004, fast_send = 1003, normal_send = 1, send_times = 0, no_data = 0, send_data_len = 0, write_data_len = 65525, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 0 '\000', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 65525, bufTail = 0, bufBorder = 65525, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbbf16494 "D"} 2: node = {nodeId = 4, sock = 20, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 0, buffer = 0x22e1078, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4647816} (gdb) c Continuing. [Switching to Thread 0x7fadbb6d4700 (LWP 9756)]

Thread 3 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0c08, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 2, parallelWorkerNum = 2, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4648771, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc07761c ""} 2: node = {nodeId = 2, sock = 23, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 2, buffer = 0x22e0fd8, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650441} (gdb) c Continuing. [Switching to Thread 0x7fadbbed5700 (LWP 9755)]

Thread 2 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0b78, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 0, parallelWorkerNum = 2, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4650217, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc057484 ""} 2: node = {nodeId = 0, sock = 21, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 2, buffer = 0x20e6a28, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4651885} (gdb) c Continuing. [Switching to Thread 0x7fadba6d2700 (LWP 9758)]

Thread 5 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0d70, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 7, parallelWorkerNum = 3, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4649095, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc1883a8 ""} 2: node = {nodeId = 7, sock = 14, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 3, buffer = 0x22e1168, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650769} (gdb) c Continuing. [Switching to Thread 0x7fadbb6d4700 (LWP 9756)]

Thread 3 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0c08, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 2, parallelWorkerNum = 3, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4648767, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc137fac ""} 2: node = {nodeId = 2, sock = 23, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 3, buffer = 0x22e0fd8, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650441} (gdb) c Continuing. [Switching to Thread 0x7fadbaed3700 (LWP 9757)]

Thread 4 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0c98, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 4, parallelWorkerNum = 1, tuples_put = 6183, tuples_get = 1, ntuples = 1010, fast_send = 1009, normal_send = 1, send_times = 0, no_data = 0, send_data_len = 0, write_data_len = 65503, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 0 '\000', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 65503, bufTail = 0, bufBorder = 65503, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbbfd6e24 "D"} 2: node = {nodeId = 4, sock = 20, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 1, buffer = 0x22e1078, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4647816} (gdb) c Continuing. [Switching to Thread 0x7fadbbed5700 (LWP 9755)]

Thread 2 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0b78, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 0, parallelWorkerNum = 3, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4650213, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc117e14 ""} 2: node = {nodeId = 0, sock = 21, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 3, buffer = 0x20e6a28, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4651885} (gdb) c Continuing. [Switching to Thread 0x7fadba6d2700 (LWP 9758)]

Thread 5 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0d70, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 7, parallelWorkerNum = 4, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4649097, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc248d38 ""} 2: node = {nodeId = 7, sock = 14, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 4, buffer = 0x22e1168, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650769} (gdb) c Continuing. [Switching to Thread 0x7fadbb6d4700 (LWP 9756)]

Thread 3 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0c08, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 2, parallelWorkerNum = 4, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4648769, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc1f893c ""} 2: node = {nodeId = 2, sock = 23, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 4, buffer = 0x22e0fd8, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650441} (gdb) c Continuing. [Switching to Thread 0x7fadbaed3700 (LWP 9757)]

Thread 4 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0c98, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 4, parallelWorkerNum = 2, tuples_put = 6517, tuples_get = 2, ntuples = 1013, fast_send = 1012, normal_send = 1, send_times = 0, no_data = 0, send_data_len = 0, write_data_len = 65515, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 0 '\000', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 65515, bufTail = 0, bufBorder = 65515, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc0977b4 "D"} 2: node = {nodeId = 4, sock = 20, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 2, buffer = 0x22e1078, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4647816} (gdb) c Continuing. [Switching to Thread 0x7fadbbed5700 (LWP 9755)]

Thread 2 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0b78, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 0, parallelWorkerNum = 4, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4650215, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc1d87a4 ""} 2: node = {nodeId = 0, sock = 21, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 4, buffer = 0x20e6a28, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4651885} (gdb) c Continuing. [Switching to Thread 0x7fadba6d2700 (LWP 9758)]

Thread 5 "postgres" hit Breakpoint 1, SendNodeData (node=node@entry=0x22e0d70, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978 6978 if (buffer->status == DataPumpSndStatus_set_socket) 1: buffer = {nodeId = 7, parallelWorkerNum = 5, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4649097, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {data = {lock = 0, count = 0, owner = 0, nusers = 0, kind = 0, spins = 0, elision = 0, list = {prev = 0x0, next = 0x0}}, size = '\000' <repeats 39 times>, align = 0}, m_cond = {data = {lock = 0, futex = 0, total_seq = 0, wakeup_seq = 0, woken_seq = 0, mutex = 0x0, nwaiters = 0, __broadcast_seq = 0}, size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc3096c8 ""} 2: node = {nodeId = 7, sock = 14, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 5, buffer = 0x22e1168, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650769} (gdb)

yazun commented 2 years ago
(gdb) info threads
Id   Target Id                                   Frame
  1    Thread 0x7fadc9fcc840 (LWP 9614) "postgres" 0x00007fadc832a9a3 in select () from /lib64/libc.so.6
  2    Thread 0x7fadbbed5700 (LWP 9755) "postgres" SendNodeData (node=node@entry=0x22e0b78, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978
* 3    Thread 0x7fadbb6d4700 (LWP 9756) "postgres" SendNodeData (node=node@entry=0x22e0c08, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978
  4    Thread 0x7fadbaed3700 (LWP 9757) "postgres" 0x00000000007ddfdb in SendNodeData (node=node@entry=0x22e0c98, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978
  5    Thread 0x7fadba6d2700 (LWP 9758) "postgres" SendNodeData (node=node@entry=0x22e0d70, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978
  6    Thread 0x7fadb9ed1700 (LWP 9759) "postgres" SendNodeData (node=node@entry=0x22e0db8, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978
  7    Thread 0x7fadb96d0700 (LWP 9760) "postgres" SendNodeData (node=node@entry=0x22e0e48, last_send=last_send@entry=0 '\000') at pgxc/squeue/squeue.c:6978
  8    Thread 0x7fadb8ecf700 (LWP 9761) "postgres" 0x00007fadc832a9a3 in select () from /lib64/libc.so.6
  9    Thread 0x7fadb86ce700 (LWP 9762) "postgres" 0x00007fadc832a9a3 in select () from /lib64/libc.so.6
  10   Thread 0x7fadb7ecd700 (LWP 9763) "postgres" 0x00007fadc9bc99dd in accept () from /lib64/libpthread.so.0
yazun commented 2 years ago

or printing them all at once:

thread apply  2 3 4 5 6 7  p *buffer

Thread 2 (Thread 0x7fadbbed5700 (LWP 9755)):
$10 = {nodeId = 0, parallelWorkerNum = 4, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4650215, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, m_cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc1d87a4 ""}

Thread 3 (Thread 0x7fadbb6d4700 (LWP 9756)):
$11 = {nodeId = 2, parallelWorkerNum = 5, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4648770, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, m_cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc2b92cc ""}

Thread 4 (Thread 0x7fadbaed3700 (LWP 9757)):
$12 = {nodeId = 4, parallelWorkerNum = 3, tuples_put = 3952, tuples_get = 1, ntuples = 1006, fast_send = 1005, normal_send = 1, send_times = 0, no_data = 0, send_data_len = 0, write_data_len = 65528, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 0 '\000', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 65528, bufTail = 0, bufBorder = 65528, sendSem = {m_cnt = 0, m_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, m_cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc158144 "D"}

Thread 5 (Thread 0x7fadba6d2700 (LWP 9758)):
$13 = {nodeId = 7, parallelWorkerNum = 5, tuples_put = 0, tuples_get = 0, ntuples = 0, fast_send = 0, normal_send = 0, send_times = 0, no_data = 4649097, send_data_len = 0, write_data_len = 0, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 0, bufTail = 0, bufBorder = 0, sendSem = {m_cnt = 0, m_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, m_cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc3096c8 ""}

Thread 6 (Thread 0x7fadb9ed1700 (LWP 9759)):
$14 = {nodeId = 8, parallelWorkerNum = 3, tuples_put = 2564, tuples_get = 2, ntuples = 993, fast_send = 992, normal_send = 1, send_times = 0, no_data = 0, send_data_len = 0, write_data_len = 65488, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 0 '\000', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 65488, bufTail = 0, bufBorder = 65488, sendSem = {m_cnt = 0, m_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, m_cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc198474 "D"}

Thread 7 (Thread 0x7fadb96d0700 (LWP 9760)):
$15 = {nodeId = 10, parallelWorkerNum = 1, tuples_put = 0, tuples_get = 0, ntuples = 42, fast_send = 42, normal_send = 0, send_times = 1, no_data = 4648909, send_data_len = 2521, write_data_len = 2521, long_tuple = 0 '\000', status = DataPumpSndStatus_set_socket, stuck = 0 '\000', last_send = 1 '\001', bufLock = 0 '\000', bufFull = 0 '\000', bufHead = 2521, bufTail = 2521, bufBorder = 2521, sendSem = {m_cnt = 0, m_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}, m_cond = {__data = {__lock = 0, __futex = 0, __total_seq = 0, __wakeup_seq = 0, __woken_seq = 0, __mutex = 0x0, __nwaiters = 0, __broadcast_seq = 0}, __size = '\000' <repeats 47 times>, __align = 0}}, bufLength = 65536, buffer = 0x7fadbc0372ec "D"}

 thread apply  2 3 4 5 6 7  p *node

Thread 2 (Thread 0x7fadbbed5700 (LWP 9755)):
$16 = {nodeId = 0, sock = 21, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 4, buffer = 0x20e6a28, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4651885}

Thread 3 (Thread 0x7fadbb6d4700 (LWP 9756)):
$17 = {nodeId = 2, sock = 23, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 5, buffer = 0x22e0fd8, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650441}

Thread 4 (Thread 0x7fadbaed3700 (LWP 9757)):
$18 = {nodeId = 4, sock = 20, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 3, buffer = 0x22e1078, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4647816}

Thread 5 (Thread 0x7fadba6d2700 (LWP 9758)):
$19 = {nodeId = 7, sock = 14, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 5, buffer = 0x22e1168, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650769}

Thread 6 (Thread 0x7fadb9ed1700 (LWP 9759)):
$20 = {nodeId = 8, sock = 16, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 3, buffer = 0x22e11b8, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4646670}

Thread 7 (Thread 0x7fadb96d0700 (LWP 9760)):
$21 = {nodeId = 10, sock = 13, numParallelWorkers = 6, status = DataPumpSndStatus_set_socket, lock = 0 '\000', errorno = 0, current_buffer = 1, buffer = 0x22e1258, last_offset = 0, remaining_length = 0, ntuples = 0, sleep_count = 0, send_timies = 4650585}
beth-database commented 2 years ago

Hi, @yazun thanks for feedback. I think i know the problem. When there is less free space in buffer, parallel worker will wait for sender thread to send data, after sender thread send data successfully, it will free some space, then the parallel worker can pump data to buffer. However, the sender send data only when the data size >= batch size. In the example, data size < batch size = buffer size, so the sender won't send data. Therefore, there always be not enough space for data pump. So the parallel worker and sender thread hang. In the example, you can try to decrease sender_thread_batch_size to such as 32, and then run the sql to check whether hang or not. We will fix the hang bug in new version. But in order to get better performance of parallel query, batch size should less than buffer size, maybe you can set sender_thread_batch_size= sender_thread_buffer_size/2 in your cluster.

yazun commented 2 years ago

excellent! What you wrote should make to the documentation! Will amend the values and will let you know.

yazun commented 2 years ago

Just to be sure: Would it mean the most efficient setting would be sender_thread_buffer_size = sender_thread_batch_size * sender_thread_num ?

yazun commented 2 years ago

also, for a 12 node cluster, should we set sender_thread_num to 12 for an optimal parallel squeue operations?

yazun commented 2 years ago

also, is pgxl_remote_fetch_size related in any similar way to the data pump thread buffer?

yazun commented 2 years ago

setting sender_thread_buffer_size to 512 fixed the problem for this query!

beth-database commented 2 years ago

It's not really true for 'sender_thread_buffer_size = sender_thread_batch_size * sender_thread_num' to getting best result. For each sender thread, it has some buffers for parallel workers, and for each buffer, as long as the amount of data reaches the batch size threshold, the sender thread will send data. Sender threads do not affect each other.

beth-database commented 2 years ago

"also, for a 12 node cluster, should we set sender_thread_num to 12 for an optimal parallel squeue operations?" The real sender threads number = min(sender_thread_num, consumer num), consumer num normally is equal number of datanode, so you can set sender_thread_num = 12 for you cluster, and each sender thread will work for one node. This depends on your machine performance, if the machine is low configure, maybe you can set sender_thread_num = node_number/2 or node_number/3...

beth-database commented 2 years ago

"also, is pgxl_remote_fetch_size related in any similar way to the data pump thread buffer?" I think pgxl_remote_fetch_size is set to avoid oversize client's memory. It is not related to the buffer size of parallel queries.

yazun commented 2 years ago

Thanks a lot of clarifications @beth-database. It would be nice to have them in the documentation somewhere.

JennyJennyChen commented 2 years ago

@yazun hello,this bug have fixed by @beth-database And commit patch is :

https://github.com/Tencent/TBase/commit/20e3f09bc0e8fe8507890b9b5d539506bff4729a

you can download the latest code to try .

thanks for your replies, and thanks @beth-database

yazun commented 2 years ago

Will test and will get back to you this week! Thanks a lot!