Tencent / TBase

TBase is an enterprise-level distributed HTAP database. Through a single database cluster to provide users with highly consistent distributed database services and high-performance data warehouse services, a set of integrated enterprise-level solutions is formed.
Other
1.38k stars 262 forks source link

Segfaults after query plan change (or partial hashAggregate problem?) #100

Closed yazun closed 3 years ago

yazun commented 3 years ago

We started to have segfaults with a relatively simple query on relatively small, replicated table that grew a little (few K rows) and sharded child table linked to it,

The query on run(runid int,status text,...) replicated and rundataspec(dataspecid int,status text, ...) distribute by shard(dataspecid)

r.runid,
    r.runname,
    r.runmetadata ->> 'runRequesterUserName' :: text username,
    r.size,
    r.state :: text,
    sum(stat.processed) processed,
    sum(stat.total)  total,
    r.creationdate,
    r.lastupdatedate
  from
    run r
    join (
      select
        status,
        count(*) total,
        count(*) filter (
          where
            status = 4
        ) processed,
        rd.runid
      from
        rundataspec rd
      where
        rd.runid in (
          select
            rr.runid
          from
            run rr
          where
            rr.creationdate >= now() - interval '20 days'
            and (
              rr.lastupdatedate is null
              or rr.lastupdatedate >= now() - interval '20 days'
            )
        )
      group by
        status,
        rd.runid
    ) stat on (stat.runid = r.runid)
  group by
    r.runid,
    r.size,
    r.state :: text,
    r.creationdate
  order by
    r.creationdate desc

(this query is wrapped in the function returning table, but it also segfaults as is)

The plan that causes segfaults is

QUERY PLAN crashing (with prefer_olap to true)
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Remote Subquery Scan on all (datanode1,datanode10,datanode11,datanode12,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8,datanode9)  (cost=58703.36..58705.07 rows=1138 width=148)
   ->  Sort  (cost=58703.36..58705.07 rows=1138 width=148)
         Sort Key: r.creationdate DESC
         ->  Finalize GroupAggregate  (cost=58591.32..58668.70 rows=1138 width=148)
               Group Key: r.runid, ((r.state)::text)
               ->  Sort  (cost=58591.32..58593.02 rows=1138 width=0)
                     Sort Key: r.runid, ((r.state)::text)
                     ->  Remote Subquery Scan on all (datanode1,datanode10,datanode11,datanode12,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8,datanode9)  (cost=58492.93..58556.66 rows=1138 width=0)
                           Distribute results by S: runid
                           ->  Partial HashAggregate  (cost=58392.93..58453.24 rows=1138 width=313)
                                 Group Key: r.runid, (r.state)::text
                                 ->  Nested Loop  (cost=56869.71..58386.10 rows=1138 width=265)
                                       ->  Finalize GroupAggregate  (cost=56869.61..56931.63 rows=1138 width=22)
                                             Group Key: rd.status, rd.runid
                                             ->  Sort  (cost=56869.61..56871.32 rows=1138 width=0)
                                                   Sort Key: rd.status, rd.runid
                                                   ->  Remote Subquery Scan on all (datanode1,datanode10,datanode11,datanode12,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8,datanode9)  (cost=56769.52..56834.95 rows=1138 width=0)
                                                         Distribute results by S: runid
                                                         ->  Partial GroupAggregate  (cost=56669.52..56731.54 rows=1138 width=22)
                                                               Group Key: rd.status, rd.runid
                                                               ->  Sort  (cost=56669.52..56671.23 rows=1138 width=6)
                                                                     Sort Key: rd.status, rd.runid
                                                                     ->  Nested Loop  (cost=0.11..56634.86 rows=1138 width=6)
                                                                           ->  Seq Scan on run rr  (cost=0.00..5660.73 rows=1 width=4)
                                                                                 Filter: ((creationdate >= (now() - '20 days'::interval)) AND ((lastupdatedate IS NULL) OR (lastupdatedate >= (now() - '20 days'::interval))))
                                                                           ->  Index Scan using rundataspec_pkey on rundataspec rd  (cost=0.11..48667.43 rows=46134 width=6)
                                                                                 Index Cond: (runid = rr.runid)
                                       ->  Index Scan using run_pkey on run r  (cost=0.10..1.23 rows=1 width=226)
                                             Index Cond: (runid = rd.runid)

The core is dumped on all datanodes and backtrace on the nodes we checked looks similar and indicates problem with int8_avg_combine:

gdb postgres /DATA_PRIMARY/cores/%E_32682_1617217181.core
GNU gdb (GDB) Red Hat Enterprise Linux 8.3-3.el7
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from postgres...
[New LWP 32682]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: dr3_ops_cs36 surveys 192.168.168.145(15326) REMOTE SUBPLAN (coord1:1'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  int8_avg_combine (fcinfo=<optimized out>) at numeric.c:4281
4281    numeric.c: No such file or directory.
Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.26-23.el7.x86_64 glibc-2.17-307.el7.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-46.el7.x86_64 libcom_err-1.42.9-17.el7.x86_64 libselinux-2.5-15.el7.x86_64 libxml2-2.9.1-6.el7.4.x86_64 nspr-4.21.0-1.el7.x86_64 nss-3.44.0-7.el7_7.x86_64 nss-softokn-freebl-3.44.0-8.el7_7.x86_64 nss-util-3.44.0-4.el7_7.x86_64 openldap-2.4.44-21.el7_6.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0  int8_avg_combine (fcinfo=<optimized out>) at numeric.c:4281
#1  0x00000000009cef4a in advance_combine_function (pergroupstate=0x1a73820, pertrans=0x184afe8, aggstate=0x1a660f0) at nodeAgg.c:1295
#2  combine_aggregates (aggstate=aggstate@entry=0x1a660f0, pergroup=pergroup@entry=0x1a73820) at nodeAgg.c:1229
#3  0x00000000009da2e3 in agg_retrieve_direct (aggstate=<optimized out>) at nodeAgg.c:3305
#4  ExecAgg (pstate=<optimized out>) at nodeAgg.c:3022
#5  0x00000000009ad7e1 in ExecProcNode (node=0x1a660f0) at ../../../src/include/executor/executor.h:273
#6  ExecSort (pstate=0x1a65d90) at nodeSort.c:153
#7  0x00000000009f195f in ExecProcNode (node=0x1a65d90) at ../../../src/include/executor/executor.h:273
#8  ExecutePlan (execute_once=<optimized out>, dest=0x1965ab8, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x1a65d90, estate=0x1a65868) at execMain.c:1955
#9  standard_ExecutorRun (queryDesc=0x18427e8, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:465
#10 0x0000000000760e9c in ExecutorRun (execute_once=<optimized out>, count=0, direction=ForwardScanDirection, queryDesc=0x18427e8) at execMain.c:408
#11 PortalRunSelect (portal=0x1a502d8, forward=<optimized out>, count=0, dest=<optimized out>) at pquery.c:1648
#12 0x0000000000761939 in PortalRun (portal=portal@entry=0x1a502d8, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', run_once=<optimized out>, dest=dest@entry=0x1965ab8, altdest=altdest@entry=0x1965ab8, completionTag=0x7ffcf89c4e00 "") at pquery.c:1294
#13 0x0000000000762993 in exec_execute_message (max_rows=9223372036854775807, portal_name=0x19656a8 "p_1_4b02_38_7c71995b") at postgres.c:2958
#14 PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at postgres.c:5507
#15 0x00000000008279a4 in BackendRun (port=0x1836bd0) at postmaster.c:4979
#16 BackendStartup (port=0x1836bd0) at postmaster.c:4651
#17 ServerLoop () at postmaster.c:1956
#18 0x00000000008293e9 in PostmasterMain (argc=<optimized out>, argv=0x180a850) at postmaster.c:1564
#19 0x00000000004f2a5b in main (argc=5, argv=0x180a850) at main.c:228

What's curious, if we switch of prefer_olap, the query works as before, no segfaults. We noticed also number of other queries when TBase crashes, when aggregates were used, but this one is the most evident.

Working, non-segaulting plan:

-- plan that works (prefer_olap to false)
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=52734.18..52735.89 rows=1138 width=148)
   Sort Key: r.creationdate DESC
   ->  GroupAggregate  (cost=52625.55..52699.52 rows=1138 width=148)
         Group Key: r.runid, ((r.state)::text)
         ->  Sort  (cost=52625.55..52627.26 rows=1138 width=265)
               Sort Key: r.runid, ((r.state)::text)
               ->  Nested Loop  (cost=51012.48..52590.89 rows=1138 width=265)
                     ->  GroupAggregate  (cost=51012.38..51079.53 rows=1138 width=22)
                           Group Key: rd.status, rd.runid
                           ->  Sort  (cost=51012.38..51014.09 rows=1138 width=6)
                                 Sort Key: rd.status, rd.runid
                                 ->  Nested Loop  (cost=0.21..50977.72 rows=1138 width=6)
                                       ->  Remote Subquery Scan on all (datanode1)  (cost=100.10..103.60 rows=1 width=4)
                                             ->  Index Scan using idx_run_date on run rr  (cost=0.10..3.60 rows=1 width=4)
                                                   Index Cond: (creationdate >= (now() - '20 days'::interval))
                                                   Filter: ((lastupdatedate IS NULL) OR (lastupdatedate >= (now() - '20 days'::interval)))
                                       ->  Materialize  (cost=100.11..49251.83 rows=46134 width=6)
                                             ->  Remote Subquery Scan on all (datanode1,datanode10,datanode11,datanode12,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8,datanode9)  (cost=100.11..49182.63 rows=46134 width=6)
                                                   ->  Index Scan using rundataspec_pkey on rundataspec rd  (cost=0.11..48667.43 rows=46134 width=6)
                                                         Index Cond: (runid = rr.runid)
                     ->  Materialize  (cost=100.10..101.46 rows=1 width=226)
                           ->  Remote Subquery Scan on all (datanode1)  (cost=100.10..101.46 rows=1 width=226)
                                 ->  Index Scan using run_pkey on run r  (cost=0.10..1.23 rows=1 width=226)
                                       Index Cond: (runid = rd.runid)
(24 rows)

It's been bit disruptive, so we hope for a fix. Thanks

yazun commented 3 years ago

Log of the query crashing the server with debug4 level


(dr3_ops_cs36@gaiadb01i:55431) [surveys] > select * from getlastactiveruns(); -- the same with the bare query
DEBUG:  Looking up portal  in the hash table
DEBUG:  Created portal  and inserted an entry in the has table
DEBUG:  Looking up portal <unnamed portal 179> in the hash table
DEBUG:  Looking up portal <unnamed portal 179> in the hash table
DEBUG:  Created portal <unnamed portal 179> and inserted an entry in the has table
DEBUG:  recv 5 size 5.
DEBUG:  recv 48 size 48.
DEBUG:  Established a connection with datanode "datanode1",remote backend PID 18693, socket fd 67, global session T
DEBUG:  Established a connection with datanode "datanode10",remote backend PID 3117, socket fd 68, global session T
DEBUG:  Established a connection with datanode "datanode11",remote backend PID 42945, socket fd 69, global session T
DEBUG:  Established a connection with datanode "datanode12",remote backend PID 28366, socket fd 70, global session T
DEBUG:  Established a connection with datanode "datanode2",remote backend PID 31310, socket fd 71, global session T
DEBUG:  Established a connection with datanode "datanode3",remote backend PID 10007, socket fd 76, global session T
DEBUG:  Established a connection with datanode "datanode4",remote backend PID 29707, socket fd 77, global session T
DEBUG:  Established a connection with datanode "datanode5",remote backend PID 14263, socket fd 78, global session T
DEBUG:  Established a connection with datanode "datanode6",remote backend PID 4826, socket fd 79, global session T
DEBUG:  Established a connection with datanode "datanode7",remote backend PID 2732, socket fd 80, global session T
DEBUG:  Established a connection with datanode "datanode8",remote backend PID 18130, socket fd 81, global session T
DEBUG:  Established a connection with datanode "datanode9",remote backend PID 22012, socket fd 82, global session T
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode1 pid:18693 connections
DEBUG:  PreFetchConnection connection 16396 to step p_1_4b02_23_c601182
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode10 pid:3117 connections
DEBUG:  PreFetchConnection connection 16397 to step p_1_4b02_23_c601182
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode11 pid:42945 connections
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode2 pid:31310 connections
DEBUG:  PreFetchConnection connection 16400 to step p_1_4b02_23_c601182
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode3 pid:10007 connections
DEBUG:  PreFetchConnection connection 16401 to step p_1_4b02_23_c601182
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode4 pid:29707 connections
DEBUG:  PreFetchConnection connection 16402 to step p_1_4b02_23_c601182
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode5 pid:14263 connections
DEBUG:  PreFetchConnection connection 16403 to step p_1_4b02_23_c601182
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode6 pid:4826 connections
DEBUG:  PreFetchConnection connection 16404 to step p_1_4b02_23_c601182
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode7 pid:2732 connections
DEBUG:  PreFetchConnection connection 16405 to step p_1_4b02_23_c601182
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode8 pid:18130 connections
DEBUG:  PreFetchConnection connection 16406 to step p_1_4b02_23_c601182
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode9 pid:22012 connections
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode1 pid:18693 connections
DEBUG:  PreFetchConnection connection 16396 to step p_1_4b02_23_c601182
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode10 pid:3117 connections
DEBUG:  PreFetchConnection connection 16397 to step p_1_4b02_23_c601182
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode11 pid:42945 connections
DEBUG:  PreFetchConnection connection 16398 to step p_1_4b02_23_c601182
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode12 pid:28366 connections
DEBUG:  PreFetchConnection connection 16399 to step p_1_4b02_23_c601182
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode2 pid:31310 connections
DEBUG:  PreFetchConnection connection 16400 to step p_1_4b02_23_c601182
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode3 pid:10007 connections
DEBUG:  PreFetchConnection connection 16401 to step p_1_4b02_23_c601182
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode4 pid:29707 connections
DEBUG:  PreFetchConnection connection 16402 to step p_1_4b02_23_c601182
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode5 pid:14263 connections
DEBUG:  PreFetchConnection connection 16403 to step p_1_4b02_23_c601182
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode6 pid:4826 connections
DEBUG:  PreFetchConnection connection 16404 to step p_1_4b02_23_c601182
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode7 pid:2732 connections
DEBUG:  PreFetchConnection connection 16405 to step p_1_4b02_23_c601182
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode8 pid:18130 connections
DEBUG:  PreFetchConnection connection 16406 to step p_1_4b02_23_c601182
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode9 pid:22012 connections
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode1 pid:18693 connections
DEBUG:  PreFetchConnection connection 16396 to step p_1_4b02_23_c601182
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode10 pid:3117 connections
DEBUG:  PreFetchConnection connection 16397 to step p_1_4b02_23_c601182
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode11 pid:42945 connections
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode12 pid:28366 connections
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode2 pid:31310 connections
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode3 pid:10007 connections
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode4 pid:29707 connections
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode5 pid:14263 connections
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode6 pid:4826 connections
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode7 pid:2732 connections
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode8 pid:18130 connections
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode9 pid:22012 connections
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode1 pid:18693 connections
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode10 pid:3117 connections
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode11 pid:42945 connections
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode12 pid:28366 connections
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode2 pid:31310 connections
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode3 pid:10007 connections
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode4 pid:29707 connections
DEBUG:  timeout 1 while waiting for any response from 1 connections
DEBUG:  timeout 1 while waiting for any response from node:datanode5 pid:14263 connections
DEBUG:  nread returned 0
DEBUG:  Remote node "datanode6", running with pid 4826 returned an error: unexpected EOF on datanode connection.
LOG:  unexpected EOF on node:datanode6 pid:4826, read_status:-1, EOF:-1
LOG:  pool_recvres return code:11
DEBUG:  Remote node "datanode1", running with pid 18693 returned an error: Datanode closed the connection unexpectedly
        This probably means the Datanode terminated abnormally
        before or while processing the request.

DEBUG:  Remote node "datanode1", running with pid 18693 returned an error: unexpected EOF on datanode connection.
LOG:  unexpected EOF on node:datanode1 pid:18693, read_status:-1, EOF:-1
DEBUG:  Failed to read response from data node:datanode1 pid:18693 when ending query for ERROR, node status:3
DEBUG:  nread returned 0
DEBUG:  Remote node "datanode9", running with pid 22012 returned an error: unexpected EOF on datanode connection.
LOG:  unexpected EOF on node:datanode9 pid:22012, read_status:-1, EOF:-1
DEBUG:  Failed to read response from data node:datanode9 pid:22012 when ending query for ERROR, node status:3
DEBUG:  Remote node "datanode8", running with pid 18130 returned an error: Datanode closed the connection unexpectedly
        This probably means the Datanode terminated abnormally
        before or while processing the request.

DEBUG:  Remote node "datanode8", running with pid 18130 returned an error: unexpected EOF on datanode connection.
LOG:  unexpected EOF on node:datanode8 pid:18130, read_status:-1, EOF:-1
DEBUG:  Failed to read response from data node:datanode8 pid:18130 when ending query for ERROR, node status:3
DEBUG:  Remote node "datanode7", running with pid 2732 returned an error: Datanode closed the connection unexpectedly
        This probably means the Datanode terminated abnormally
        before or while processing the request.

DEBUG:  pgxc_abort_connections node:datanode6 not ready for query, status:3
DEBUG:  nread returned 0
LOG:  pgxc_node_flush_read node:datanode6 read failure.
WARNING:  pgxc_abort_connections dn node:datanode7 invalid socket 4294967295!
WARNING:  pgxc_abort_connections dn node:datanode8 invalid socket 4294967295!
DEBUG:  pgxc_abort_connections node:datanode9 not ready for query, status:3
DEBUG:  nread returned 0
LOG:  pgxc_node_flush_read node:datanode9 read failure.
DEBUG:   Begin to recheck pgxc_abort_connections 0 coordinator handle
DEBUG:   Begin to recheck pgxc_abort_connections 12 datanode handle
WARNING:  pgxc_abort_connections dn node:datanode1 invalid socket 4294967295!
WARNING:  pgxc_abort_connections dn node:datanode10 invalid socket 4294967295!
WARNING:  pgxc_abort_connections dn node:datanode11 invalid socket 4294967295!
DEBUG:  pgxc_abort_connections recheck node:datanode12 not ready for query, status:0, sync
DEBUG:  Remote node "datanode12", running with pid 28366 returned an error: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

LOG:  pgxc_node_flush data to datanode:16398 fd:70 failed for Success
DEBUG:  Remote node "datanode12", running with pid 28366 returned an error: failed to send data to datanode
WARNING:  pgxc_abort_connections dn node:datanode2 invalid socket 4294967295!
WARNING:  pgxc_abort_connections dn node:datanode3 invalid socket 4294967295!
WARNING:  pgxc_abort_connections dn node:datanode4 invalid socket 4294967295!
WARNING:  pgxc_abort_connections dn node:datanode5 invalid socket 4294967295!
DEBUG:  pgxc_abort_connections recheck node:datanode6 not ready for query, status:0, sync
DEBUG:  Remote node "datanode6", running with pid 4826 returned an error: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

LOG:  pgxc_node_flush data to datanode:16403 fd:79 failed for Success
DEBUG:  Remote node "datanode6", running with pid 4826 returned an error: failed to send data to datanode
WARNING:  pgxc_abort_connections dn node:datanode7 invalid socket 4294967295!
WARNING:  pgxc_abort_connections dn node:datanode8 invalid socket 4294967295!
DEBUG:  pgxc_abort_connections recheck node:datanode9 not ready for query, status:0, sync
DEBUG:  Remote node "datanode9", running with pid 22012 returned an error: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

LOG:  pgxc_node_flush data to datanode:16406 fd:82 failed for Success
DEBUG:  Remote node "datanode9", running with pid 22012 returned an error: failed to send data to datanode
DEBUG:  Remote node "datanode12", running with pid 28366 returned an error: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

LOG:  pgxc_node_flush data to datanode:16398 fd:70 failed for Success
DEBUG:  Remote node "datanode12", running with pid 28366 returned an error: failed to send data to datanode
DEBUG:  Remote node "datanode12", running with pid 28366 returned an error: failed to send ROLLBACK TRANSACTION command
LOG:  Failed to send ROLLBACK TRANSACTION command nodename:datanode12, pid:28366
DEBUG:  Remote node "datanode6", running with pid 4826 returned an error: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

LOG:  pgxc_node_flush data to datanode:16403 fd:79 failed for Success
DEBUG:  Remote node "datanode6", running with pid 4826 returned an error: failed to send data to datanode
DEBUG:  Remote node "datanode6", running with pid 4826 returned an error: failed to send ROLLBACK TRANSACTION command
LOG:  Failed to send ROLLBACK TRANSACTION command nodename:datanode6, pid:4826
DEBUG:  Remote node "datanode9", running with pid 22012 returned an error: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.

LOG:  pgxc_node_flush data to datanode:16406 fd:82 failed for Success
DEBUG:  Remote node "datanode9", running with pid 22012 returned an error: failed to send data to datanode
DEBUG:  Remote node "datanode9", running with pid 22012 returned an error: failed to send ROLLBACK TRANSACTION command
LOG:  Failed to send ROLLBACK TRANSACTION command nodename:datanode9, pid:22012
LOG:  Remote node "datanode12", running with pid 28366 state:3 is bad
LOG:  found bad remote node connections, force release handles now
DEBUG:  Connection to Datanode 16398 has unexpected state 3 and will be dropped
DEBUG:  Connection to Datanode 16403 has unexpected state 3 and will be dropped
DEBUG:  Connection to Datanode 16406 has unexpected state 3 and will be dropped
DEBUG:  Returning connections back to the pool
DEBUG:  [re]setting xid = 0, old_value = 0
DEBUG:  Dropped portal <unnamed portal 179> (prepared statement (null)) and removed entry from the hash table
DEBUG:  Dropped portal  (prepared statement (null)) and removed entry from the hash table
ERROR:  Failed to receive more data from data node 16403
CONTEXT:  PL/pgSQL function getlastactiveruns() line 1 at RETURN QUERY
yazun commented 3 years ago

Looks like it has been related to a PG bug that has been fixed quite some ago in the PG and was ported to XL. The bug discussion and fix are here:

https://www.postgresql.org/message-id/flat/20171110185747.31519.28038%40wrigleys.postgresql.org

On a general note, looks like TBase c.h is quite different from latest XL and could profit from an update. Hoping to see the PG12/13 merge soon!

This bug also happens only with newer gcc (we use devtoolset 9).

if the patched c.h as described in the above thread:

/* 128-bit signed and unsigned integers
 *      There currently is only limited support for such types.
 *      E.g. 128bit literals and snprintf are not supported; but math is.
 *      Also, because we exclude such types when choosing MAXIMUM_ALIGNOF,
 *      it must be possible to coerce the compiler to allocate them on no
 *      more than MAXALIGN boundaries.
 */
#if defined(PG_INT128_TYPE)
#if defined(pg_attribute_aligned) || ALIGNOF_PG_INT128_TYPE <= MAXIMUM_ALIGNOF
#define HAVE_INT128 1

typedef PG_INT128_TYPE int128
#if defined(pg_attribute_aligned)
pg_attribute_aligned(MAXIMUM_ALIGNOF)
#endif
;

typedef unsigned PG_INT128_TYPE uint128
#if defined(pg_attribute_aligned)
pg_attribute_aligned(MAXIMUM_ALIGNOF)
#endif
;

#endif
#endif

it starts to work sometimes with the plan that was segfaulting before. Still segfaults often, but not always.

So it looks like a deeper look is needed. Commit with this small patch can be seen here: https://github.com/yazun/TBase/commit/3ccce9f9cf38ac584e571a3ff24ad97d9fdf8fc0

JennyJennyChen commented 3 years ago

Thank you very much for the bug. We have merged the patch you mentioned. The problem has been assigned to the developer for analysis and resolution. I believe there will be a response soon.

yazun commented 3 years ago

Hello, Any news on this one?

JennyJennyChen commented 3 years ago

Hello, Any news on this one?

I'm very sorry for taking so long to reply to you. We can't reproduce the problem you mentioned: image image

Can you provide more information? For example, detailed create table statements, select statements, data in the table, etc.

JennyJennyChen commented 3 years ago

My environment failed to generate your execution plan. My environment execution plan is:

postgres=# show prefer_olap;
 prefer_olap 
-------------
 off
(1 row)

postgres=# select count(*) from run;
 count 
-------
  2000
(1 row)

postgres=# select count(*) from rundataspec;
 count 
-------
  2000
(1 row)

postgres=# explain SELECT r.runid,
postgres-#    --r.runname,
postgres-#    --r.runmetadata :: text username,
postgres-#     r.size,
postgres-#     r.state :: text,
postgres-#     sum(stat.processed) processed,
postgres-#     sum(stat.total)  total,
postgres-#     r.creationdate
postgres-#    --r.lastupdatedate
postgres-#   from
postgres-#  run r
postgres-#     join (
postgres(#       select
postgres(#         status,
postgres(#         count(*) total,
postgres(#         count(*) filter (
postgres(#           where
postgres(#             status = '4'
postgres(#         ) processed,
postgres(#         rd.runid
postgres(#       from
postgres(#         rundataspec rd
postgres(#       where
postgres(#         rd.runid in (
postgres(#           select
postgres(#             rr.runid
postgres(#           from
postgres(#             run rr
postgres(#           where
postgres(#             rr.creationdate >= now() - interval '20 days'
postgres(#             and (
postgres(#               rr.lastupdatedate is null
postgres(#               or rr.lastupdatedate >= now() - interval '20 days'
postgres(#             )
postgres(#         )
postgres(#       group by
postgres(#         status,
postgres(#         rd.runid
postgres(#     ) stat on (stat.runid = r.runid)
postgres-#   group by
postgres-#     r.runid,
postgres-#     r.size,
postgres-#     r.state :: text,
postgres-#     r.creationdate
postgres-#   order by
postgres-#     r.creationdate desc;
                                                                                                 QUERY PLAN                                                                                                  
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=60733.81..60766.31 rows=1000 width=78)
   Group Key: r.creationdate, r.runid, r.size, r.state
   ->  Sort  (cost=60733.81..60736.31 rows=1000 width=30)
         Sort Key: r.creationdate DESC, r.runid, r.size, r.state
         ->  Nested Loop  (cost=30533.51..60683.98 rows=1000 width=30)
               Join Filter: (r.runid = stat.runid)
               ->  Remote Subquery Scan on all (dn002)  (cost=100.00..198.00 rows=2000 width=14)
                     ->  Seq Scan on run r  (cost=0.00..60.00 rows=2000 width=14)
               ->  Materialize  (cost=30433.51..30488.48 rows=1000 width=20)
                     ->  Subquery Scan on stat  (cost=30433.51..30483.48 rows=1000 width=20)
                           ->  GroupAggregate  (cost=30433.51..30473.48 rows=1000 width=22)
                                 Group Key: rd.status, rd.runid
                                 ->  Sort  (cost=30433.51..30438.51 rows=1998 width=6)
                                       Sort Key: rd.status, rd.runid
                                       ->  Nested Loop Semi Join  (cost=200.00..30323.98 rows=1998 width=6)
                                             Join Filter: (rd.runid = rr.runid)
                                             ->  Remote Subquery Scan on all (dn001,dn002)  (cost=100.00..141.00 rows=1000 width=6)
                                                   ->  Seq Scan on rundataspec rd  (cost=0.00..30.00 rows=1000 width=6)
                                             ->  Materialize  (cost=100.00..217.97 rows=1998 width=4)
                                                   ->  Remote Subquery Scan on all (dn002)  (cost=100.00..207.98 rows=1998 width=4)
                                                         ->  Seq Scan on run rr  (cost=0.00..90.00 rows=1998 width=4)
                                                               Filter: ((creationdate >= (now() - '20 days'::interval)) AND ((lastupdatedate IS NULL) OR (lastupdatedate >= (now() - '20 days'::interval))))
(22 rows)

postgres=# 

In addition, you can try again with the latest version 2.2.0 to verify whether this problem still exists

yazun commented 3 years ago

Thanks for checking! Yes, will try with 2.2.0 next week and will report back.

This is not crucial data so maybe haring it could be an option as well. Let's see how 2.2.0 behaves.

yazun commented 3 years ago

Hello, We upgraded to 2.2.0 and we still experience this problem, the 1st query listed in this issue crashes the cluster the same way. Will be happy to discuss how we could fix it..

 gdb postgres /DATA_PRIMARY/cores/%E_19910_1624377279.core
GNU gdb (GDB) Red Hat Enterprise Linux 8.3-3.el7
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from postgres...
[New LWP 19910]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `postgres: dr3_ops_cs36 surveys 192.168.168.156(21820) REMOTE SUBPLAN (coord12:'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  int8_avg_combine (fcinfo=<optimized out>) at numeric.c:4281
4281    numeric.c: No such file or directory.
Missing separate debuginfos, use: debuginfo-install cyrus-sasl-lib-2.1.26-23.el7.x86_64 glibc-2.17-307.el7.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.15.1-46.el7.x86_64 libcom_err-1.42.9-17.el7.x86_64 libselinux-2.5-15.el7.x86_64 libxml2-2.9.1-6.el7.4.x86_64 nspr-4.21.0-1.el7.x86_64 nss-3.44.0-7.el7_7.x86_64 nss-softokn-freebl-3.44.0-8.el7_7.x86_64 nss-util-3.44.0-4.el7_7.x86_64 openldap-2.4.44-21.el7_6.x86_64 openssl-libs-1.0.2k-19.el7.x86_64 pcre-8.32-17.el7.x86_64 xz-libs-5.2.2-1.el7.x86_64 zlib-1.2.7-18.el7.x86_64
(gdb) bt
#0  int8_avg_combine (fcinfo=<optimized out>) at numeric.c:4281
#1  0x00000000009d673a in advance_combine_function (pergroupstate=0x1bd5f30, pertrans=0x1b07268, aggstate=0x19ea4a8) at nodeAgg.c:1309
#2  combine_aggregates (aggstate=aggstate@entry=0x19ea4a8, pergroup=pergroup@entry=0x1bd5f30) at nodeAgg.c:1243
#3  0x00000000009e2c53 in agg_retrieve_direct (aggstate=<optimized out>) at nodeAgg.c:3319
#4  ExecAgg (pstate=<optimized out>) at nodeAgg.c:3036
#5  0x00000000009b7841 in ExecProcNode (node=0x19ea4a8) at ../../../src/include/executor/executor.h:275
#6  ExecSort (pstate=0x19ea008) at nodeSort.c:154
#7  0x00000000009fe186 in ExecProcNode (node=0x19ea008) at ../../../src/include/executor/executor.h:275
#8  ExecutePlan (execute_once=<optimized out>, dest=0x1993dd8, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x19ea008, estate=<optimized out>) at execMain.c:2061
#9  standard_ExecutorRun (queryDesc=<optimized out>, direction=<optimized out>, count=0, execute_once=<optimized out>) at execMain.c:471
#10 0x0000000000762b1c in ExecutorRun (execute_once=<optimized out>, count=0, direction=ForwardScanDirection, queryDesc=0x1b30868) at execMain.c:414
#11 PortalRunSelect (portal=0x18c5268, forward=<optimized out>, count=0, dest=<optimized out>) at pquery.c:1715
#12 0x0000000000763571 in PortalRun (portal=portal@entry=0x18c5268, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', run_once=<optimized out>, dest=dest@entry=0x1993dd8, altdest=altdest@entry=0x1993dd8, completionTag=0x7fff943eebd0 "") at pquery.c:1356
#13 0x00000000007654f3 in exec_execute_message (max_rows=9223372036854775807, portal_name=0x19939c8 "p_4_4e4_d_137dd706") at postgres.c:2995
#14 PostgresMain (argc=<optimized out>, argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at postgres.c:5571
#15 0x000000000082a8e4 in BackendRun (port=0x18b0cf0) at postmaster.c:4982
#16 BackendStartup (port=0x18b0cf0) at postmaster.c:4654
#17 ServerLoop () at postmaster.c:1959
#18 0x000000000082c329 in PostmasterMain (argc=<optimized out>, argv=0x1883890) at postmaster.c:1567
#19 0x00000000004f4a4d in main (argc=5, argv=0x1883890) at main.c:233
(gdb)
JennyJennyChen commented 3 years ago

1、In our environment, using our own data and create table sql still cannot reproduce such problems (The execution plan is very similar to the one you showed above, but not the same). Able to execute the results correctly.

postgres=# show prefer_olap;
 prefer_olap 
-------------
 on
(1 row)

postgres=# explain SELECT r.runid,
   --r.runname,
   --r.runmetadata :: text username,
    r.size,
    r.state :: text,
    sum(stat.processed) processed,
    sum(stat.total)  total,
    r.creationdate
   --r.lastupdatedate
  from
 run r
    join (
      select
        status,
        count(*) total,
        count(*) filter (
          where
            status = '4'
        ) processed,
        rd.runid
      from
        rundataspec rd
      where
        rd.runid in (
          select
            rr.runid
          from
            run rr
          where
            rr.creationdate >= now() - interval '20 days'
            and (
              rr.lastupdatedate is null
              or rr.lastupdatedate >= now() - interval '20 days'
            )
        )
      group by
        status,
        rd.runid
    ) stat on (stat.runid = r.runid)
  group by
    r.runid,
    r.size,
    r.state :: text,
    r.creationdate
  order by
    r.creationdate desc;
                                                                                                       QUERY PLAN                                                               

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------
 Remote Subquery Scan on all (dn001,dn002)  (cost=60492.87..60530.33 rows=999 width=86)
   ->  Finalize GroupAggregate  (cost=60492.87..60530.33 rows=999 width=86)
         Group Key: r.creationdate, r.runid, r.size, r.state
         ->  Sort  (cost=60492.87..60495.37 rows=999 width=86)
               Sort Key: r.creationdate DESC, r.runid, r.size, r.state
               ->  Remote Subquery Scan on all (dn001,dn002)  (cost=60337.20..60443.10 rows=999 width=86)
                     Distribute results by S: runid
                     ->  Partial HashAggregate  (cost=60237.20..60252.19 rows=999 width=86)
                           Group Key: r.creationdate, r.runid, r.size, r.state
                           ->  Nested Loop  (cost=30144.77..60222.22 rows=999 width=38)
                                 Join Filter: (r.runid = stat.runid)
                                 ->  Seq Scan on run r  (cost=0.00..60.00 rows=2000 width=22)
                                 ->  Materialize  (cost=30144.77..30194.72 rows=999 width=24)
                                       ->  Subquery Scan on stat  (cost=30144.77..30189.72 rows=999 width=24)
                                             ->  GroupAggregate  (cost=30144.77..30169.74 rows=999 width=26)
                                                   Group Key: rd.status, rd.runid
                                                   ->  Sort  (cost=30144.77..30147.26 rows=999 width=10)
                                                         Sort Key: rd.status, rd.runid
                                                         ->  Nested Loop Semi Join  (cost=0.00..30094.99 rows=999 width=10)
                                                               Join Filter: (rd.runid = rr.runid)
                                                               ->  Seq Scan on rundataspec rd  (cost=0.00..30.00 rows=1000 width=10)
                                                               ->  Materialize  (cost=0.00..99.99 rows=1998 width=8)
                                                                     ->  Seq Scan on run rr  (cost=0.00..90.00 rows=1998 width=8)
                                                                           Filter: ((creationdate >= (now() - '20 days'::interval)) AND ((lastupdatedate IS NULL) OR (lastupdatedate >= (now() - '20 days'::interval))))
(24 rows)  

Can you provide more information (detailed create table statements, select statements, data in the table, etc)?

2、Can we log in to your environment to see the details?

3、From the core stack, the problem is related to hashAggregate, you can try to set enable_hashagg = off;

yazun commented 3 years ago

I PMed the link to extract from our schema, it shows the problem.

Ad2. We can try to arrange it, although it's bit complicated Ad3. yes, switching off hashAggregate fixes the problem, but globally it's not the best idea.

JennyJennyChen commented 3 years ago

Can you give the real sql like below?

create  table run(runid bigint,status text, runname varchar(20),runmetadata varchar(20), size bigint, state text, creationdate date, lastupdatedate date) distribute by replication;
insert into run values(1, '1','xxx', 'xxx', 10, '1', '2021-06-20', '2021-06-20');
insert into run values(2, '2','xxx', 'xxx', 10, '2', '2021-06-20', null);
insert into run values(3, '3','xxx', 'xxx', 10, '3', '2021-06-22', '2021-06-22');
insert into run values(4, '4','xxx', 'xxx', 10, '4', '2021-06-22', null);
insert into run select i, i, 'xxx', 'xxx', 10, '4', '2021-06-23', '2021-06-23' from generate_series(34359738368, 34359740368) i;

create  table rundataspec(runid bigint, status text) distribute by shard(runid);
insert into rundataspec values(1, '1');
insert into rundataspec values(2, '2');
insert into rundataspec values(3, '3');
insert into rundataspec values(4, '4');
insert into rundataspec select i, '1' from generate_series(34359738368, 34359740368) i; 

SELECT r.runid,
   --r.runname,
   --r.runmetadata :: text username,
    r.size,
    r.state :: text,
    sum(stat.processed) processed,
    sum(stat.total)  total,
    r.creationdate
   --r.lastupdatedate
  from
    run r
    join (
      select
        status,
        count(*) total,
        count(*) filter (
          where
            status = '4'
        ) processed,
        rd.runid
      from
        rundataspec rd
      where
        rd.runid in (
          select
            rr.runid
          from
            run rr
          where
            rr.creationdate >= now() - interval '20 days'
            and (
              rr.lastupdatedate is null
              or rr.lastupdatedate >= now() - interval '20 days'
            )
        )
      group by
        status,
        rd.runid
    ) stat on (stat.runid = r.runid)
  group by
    r.runid,
    r.size,
    r.state :: text,
    r.creationdate
  order by
    r.creationdate desc;  

The above is my own test case, which may be different from yours, so we need to be exactly the same as yours

yazun commented 3 years ago

Have you had a chance to look at the pg_dump of these tables I emailed? - everything should be there..

JennyJennyChen commented 3 years ago

Thank you very much for the information. We are downloading the pg_dump information you provided and trying to reproduce the problem. I will notify you as soon as there is a result.

JennyJennyChen commented 3 years ago

Hello, I would like to know what is the character encoding format of the file you gave me? The content I opened is garbled: image

My own character set: image

yazun commented 3 years ago

Sure, it should be SQL-ASCII, but this is in the directory format, so pg_restore should be able to read it. Plain SQL might be indeed easier to parse if you need changes. Let me know if it's needed.

yazun commented 3 years ago

Uploaded tbase_numeric_avg_bug_20210625.sql.xz, in pure SQL format once uncompressed.

yazun commented 3 years ago

Also we found this segfault for other types of queries where hashAggregate is not used:

 explain select sum(length) from ts;
                                                                                                        QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Parallel Finalize Aggregate  (cost=63205580.41..63205580.46 rows=1 width=32)
   ->  Parallel Remote Subquery Scan on all (datanode1,datanode10,datanode11,datanode12,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8,datanode9)  (cost=63205580.22..63205580.41 rows=1 width=32)
         ->  Gather  (cost=63205480.22..63205480.37 rows=1 width=32)
               Workers Planned: 11
               ->  Partial Aggregate  (cost=63204480.22..63204480.27 rows=1 width=32)
                     ->  Parallel Append  (cost=0.00..63109226.31 rows=63502606 width=8)
                           ->  Parallel Seq Scan on ts_1  (cost=0.00..2839617.00 rows=2631852 width=8)
                           ->  Parallel Seq Scan on ts_2  (cost=0.00..3213928.09 rows=3866866 width=8)
                           ->  Parallel Seq Scan on ts_3  (cost=0.00..2490755.32 rows=1930858 width=8)
                           ->  Parallel Seq Scan on ts_4  (cost=0.00..2497624.06 rows=1958481 width=8)
                           ->  Parallel Seq Scan on ts_5  (cost=0.00..3110759.32 rows=3322930 width=8)
                           ->  Parallel Seq Scan on ts_6  (cost=0.00..3559324.34 rows=3860127 width=8)
                           ->  Parallel Seq Scan on ts_7  (cost=0.00..5072179.51 rows=6780870 width=8)
                           ->  Parallel Seq Scan on ts_8  (cost=0.00..4236690.68 rows=5111486 width=8)
                           ->  Parallel Seq Scan on ts_9  (cost=0.00..4071018.31 rows=4552470 width=8)
                           ->  Parallel Seq Scan on ts_10  (cost=0.00..2808730.08 rows=2609866 width=8)
                           ->  Parallel Seq Scan on ts_11  (cost=0.00..3176654.64 rows=3577397 width=8)
                           ->  Parallel Seq Scan on ts_12  (cost=0.00..2607833.73 rows=2217355 width=8)
                           ->  Parallel Seq Scan on ts_13  (cost=0.00..2618002.83 rows=2226217 width=8)
                           ->  Parallel Seq Scan on ts_14  (cost=0.00..2482195.72 rows=1909138 width=8)
                           ->  Parallel Seq Scan on ts_15  (cost=0.00..2540044.85 rows=2014297 width=8)
                           ->  Parallel Seq Scan on ts_16  (cost=0.00..2700096.27 rows=2293277 width=8)
                           ->  Parallel Seq Scan on ts_17  (cost=0.00..2951243.30 rows=2665138 width=8)
                           ->  Parallel Seq Scan on ts_18  (cost=0.00..3544150.74 rows=3449255 width=8)
                           ->  Parallel Seq Scan on ts_19  (cost=0.00..2796391.98 rows=2431624 width=8)
                           ->  Parallel Seq Scan on ts_20  (cost=0.00..3791985.50 rows=4093102 width=8)
(26 rows)

Time: 49.140 ms
(dr3_ops_cs36@gaiadb09i:55431) [surveys] > select sum(length) from ts;
 server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Time: 534396.769 ms (08:54.397)

The ts table is big, around 8B records.

Segfaults at the same position for int8 agg at numeric.c.

Dontpushme commented 3 years ago

OK, I reproduced it with:

  1. linux os
  2. gcc version 8.3.1
  3. CFLAGS='-O1' (O0 won't happen)
  4. statement
    create table t (a int8) distribute by shard(a);
    insert into t values(1);
    select avg(a) from t;

Looks like it has been related to a PG bug that has been fixed quite some ago in the PG and was ported to XL. The bug discussion and fix are here:

https://www.postgresql.org/message-id/flat/20171110185747.31519.28038%40wrigleys.postgresql.org

On a general note, looks like TBase c.h is quite different from latest XL and could profit from an update. Hoping to see the PG12/13 merge soon!

This bug also happens only with newer gcc (we use devtoolset 9).

if the patched c.h as described in the above thread:

/* 128-bit signed and unsigned integers
 *        There currently is only limited support for such types.
 *        E.g. 128bit literals and snprintf are not supported; but math is.
 *        Also, because we exclude such types when choosing MAXIMUM_ALIGNOF,
 *        it must be possible to coerce the compiler to allocate them on no
 *        more than MAXALIGN boundaries.
 */
#if defined(PG_INT128_TYPE)
#if defined(pg_attribute_aligned) || ALIGNOF_PG_INT128_TYPE <= MAXIMUM_ALIGNOF
#define HAVE_INT128 1

typedef PG_INT128_TYPE int128
#if defined(pg_attribute_aligned)
pg_attribute_aligned(MAXIMUM_ALIGNOF)
#endif
;

typedef unsigned PG_INT128_TYPE uint128
#if defined(pg_attribute_aligned)
pg_attribute_aligned(MAXIMUM_ALIGNOF)
#endif
;

#endif
#endif

it starts to work sometimes with the plan that was segfaulting before. Still segfaults often, but not always.

So it looks like a deeper look is needed. Commit with this small patch can be seen here: yazun@3ccce9f

After cherry-pick from PG, it works correctly

Commit with this small patch can be seen here: yazun@3ccce9f

I notice that your patch is not as complete as PG‘s, so I’ll cherry-pick full PG‘s commit, and...

it starts to work sometimes with the plan that was segfaulting before. Still segfaults often, but not always.

So it looks like a deeper look is needed.

maybe other problems there ? could you please switch to the latest Tbase v2.2.0 code and compile with --enable-debug flag and show us the new coredump file with same statement

Dontpushme commented 3 years ago

Oh, don't forget to re-configure project even if you don't want --enable-debug, PG's int128 patch need re-configure

Dontpushme commented 3 years ago

Uploaded tbase_numeric_avg_bug_20210625.sql.xz, in pure SQL format once uncompressed.

And I notice there are some contradiction in files you‘ve given

  1. The query's targetlist contains runname, runmetadata, lastupdatedate that didn't exist in GROUP BY list, which will cause SQL parser raise an error "column must appear in the GROUP BY clause or be used in an aggregate function"
  2. The plan shows GROUP BY keys are r.runid, r.state which didn't compatible with the query
  3. as I mensioned above plan shows group by r.runid and r.state, BUT the only bigint column is r.size, why the coredump shows that it is dong int8_avg_combine

BTW, this problem exists in any aggregate that involve with bigint (or int8), not just hashAggregate

yazun commented 3 years ago

Big thanks for checking!

Very plausible the PG patch for this was not properly pulled at the time.

Yes, the query is an obscure one, not used that often, probably from some shadow of the past, but was the first one that we spotted the problem. The query needs change of the names of the tables with the first posted, but should work, verified.

Will redeploy the system with the master to see and report back, then will redeploy with enable-debug if the problem still persist.

Dontpushme commented 3 years ago

Will redeploy the system with the master to see and report back, then will redeploy with enable-debug if the problem still persist.

not master,choose v2.2.0 :)

yazun commented 3 years ago

Yes, sure. I can report it fixed the problem! Thanks again!