Tencent / TBase

TBase is an enterprise-level distributed HTAP database. Through a single database cluster to provide users with highly consistent distributed database services and high-performance data warehouse services, a set of integrated enterprise-level solutions is formed.
Other
1.4k stars 263 forks source link

CONSUMER_ERROR set, err_msg "unrecognized node type: 113 #116

Open yazun opened 2 years ago

yazun commented 2 years ago

Looks like a new error we get (on V2.2) quite often, but without a clear repeatability pattern but happening very often:

ERROR:  node:datanode6, backend_pid:36893, nodename:datanode6,backend_pid:36893,message:node:datanode7, backend_pid:26508, nodename:datanode7,backend_pid:26508,message:Failed to read from SQueue, CONSUMER_ERROR set, err_msg "unrecognized node type: 113",,                                                                                                                                                                                                                                                                                ]  0% ETA 0:00:00
ERROR:  node:datanode10, backend_pid:11252, nodename:datanode10,backend_pid:11252,message:node:datanode7, backend_pid:26576, nodename:datanode7,backend_pid:26576,message:Failed to read from SQueue, CONSUMER_ERROR set, err_msg "unrecognized node type: 113",,
ERROR:  node:datanode12, backend_pid:3072, nodename:datanode12,backend_pid:3072,message:node:datanode11, backend_pid:39323, nodename:datanode11,backend_pid:39323,message:Failed to read from SQueue, CONSUMER_ERROR set, err_msg "unrecognized node type: 113",,                                                                                                                                                                                                                                                                              ]  0% ETA 0:00:00
ERROR:  node:datanode11, backend_pid:39452, nodename:datanode11,backend_pid:39452,message:node:datanode7, backend_pid:26591, nodename:datanode7,backend_pid:26591,message:Failed to read from SQueue, CONSUMER_ERROR set, err_msg "unrecognized node type: 113",,                                                                                                                                                                                                                                                                              ]  0% ETA 0:00:00
ERROR:  node:datanode12, backend_pid:2966, nodename:datanode12,backend_pid:2966,message:node:datanode11, backend_pid:39323, nodename:datanode11,backend_pid:39323,message:Failed to read from SQueue, CONSUMER_ERROR set, err_msg "unrecognized node type: 113",,
ERROR:  node:datanode11, backend_pid:39016, nodename:datanode11,backend_pid:39016,message:node:datanode11, backend_pid:39309, nodename:datanode11,backend_pid:39309,message:Failed to read from SQueue, CONSUMER_ERROR set, err_msg "unrecognized node type: 113",

enum NodeType 113 is mapped to T_ProjectSetPath it seems:

eu-readelf --debug-dump=info `which postgres` | grep -A 3 T_ProjectSetPath
...
--
               name                 (strp) "T_ProjectSetPath"
               const_value          (data1) 213
 [1792a48]      enumerator           abbrev: 30

Example plan that fails:

                                                                                                                                                               QUERY PLAN                                                                                                                                                                                                                                                                                                                              ]  1% ETA 0:46:42
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Remote Subquery Scan on all (datanode1,datanode10,datanode11,datanode12,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8,datanode9)  (cost=210.76..207729.82 rows=1 width=283)
   ->  Insert on final_dr3_animation_variables_exported_1  (cost=210.76..207729.82 rows=1 width=283)
         ->  Remote Subquery Scan on all (datanode7)  (cost=210.76..207729.82 rows=1 width=283)
               Distribute results by S: sourceid
               ->  Nested Loop  (cost=110.76..207628.09 rows=1 width=283)
                     ->  Nested Loop  (cost=110.13..187626.99 rows=1 width=2636)
                           ->  Nested Loop  (cost=110.13..157626.47 rows=1 width=3414)
                                 ->  Remote Subquery Scan on all (datanode1,datanode10,datanode11,datanode12,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8,datanode9)  (cost=110.13..127625.94 rows=1 width=4192)
                                       ->  Nested Loop  (cost=10.13..127521.74 rows=1 width=4192)
                                             ->  Nested Loop  (cost=9.01..126686.07 rows=1 width=3281)
                                                   ->  Nested Loop  (cost=7.88..125850.39 rows=1 width=2045)
                                                         ->  Nested Loop  (cost=6.76..125014.72 rows=1 width=1068)
                                                               ->  Nested Loop  (cost=5.63..124179.26 rows=1 width=1044)
                                                                     ->  Nested Loop  (cost=4.50..123343.43 rows=1 width=747)
                                                                           ->  Nested Loop  (cost=3.36..122507.61 rows=1 width=446)
                                                                                 ->  Nested Loop  (cost=2.22..121671.80 rows=1 width=139)
                                                                                       ->  Bitmap Heap Scan on final_dr3_export_helper  (cost=1.10..836.24 rows=1 width=128)
                                                                                             Recheck Cond: ((sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint))
                                                                                             ->  Bitmap Index Scan on uq_final_dr3_export_helper  (cost=0.00..1.09 rows=1 width=0)
                                                                                                   Index Cond: ((sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint))
                                                                                       ->  Bitmap Heap Scan on final_dr3_run90002_flat_environment_10  (cost=1.13..835.46 rows=1 width=64)
                                                                                             Recheck Cond: ((sourceid = final_dr3_export_helper.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint))
                                                                                             ->  Bitmap Index Scan on pk_final_dr3_run90002_flat_environment_10  (cost=0.00..1.12 rows=1 width=0)
                                                                                                   Index Cond: ((sourceid = final_dr3_export_helper.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint))
                                                                                 ->  Bitmap Heap Scan on timeseriesresult_1_1 tsr_g  (cost=1.14..835.68 rows=1 width=307)
                                                                                       Recheck Cond: ((runid = 3852) AND (catalogid = '2'::smallint) AND (sourceid = final_dr3_export_helper.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = 336935))
                                                                                       ->  Bitmap Index Scan on timeseriesresult_1_1_pkey  (cost=0.00..1.14 rows=1 width=0)
                                                                                             Index Cond: ((runid = 3852) AND (catalogid = '2'::smallint) AND (sourceid = final_dr3_export_helper.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = 336935))
                                                                           ->  Bitmap Heap Scan on timeseriesresult_1_1 tsr_bp  (cost=1.14..835.68 rows=1 width=307)
                                                                                 Recheck Cond: ((runid = 3852) AND (catalogid = '2'::smallint) AND (sourceid = final_dr3_export_helper.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = 336911))
                                                                                 ->  Bitmap Index Scan on timeseriesresult_1_1_pkey  (cost=0.00..1.14 rows=1 width=0)
                                                                                       Index Cond: ((runid = 3852) AND (catalogid = '2'::smallint) AND (sourceid = final_dr3_export_helper.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = 336911))
                                                                     ->  Bitmap Heap Scan on timeseriesresult_1_1 tsr_rp  (cost=1.14..835.68 rows=1 width=307)
                                                                           Recheck Cond: ((runid = 3852) AND (catalogid = '2'::smallint) AND (sourceid = final_dr3_export_helper.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = 336928))
                                                                           ->  Bitmap Index Scan on timeseriesresult_1_1_pkey  (cost=0.00..1.14 rows=1 width=0)
                                                                                 Index Cond: ((runid = 3852) AND (catalogid = '2'::smallint) AND (sourceid = final_dr3_export_helper.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = 336928))
                                                               ->  Bitmap Heap Scan on source_1 s  (cost=1.12..835.46 rows=1 width=26)
                                                                     Recheck Cond: ((sourceid = final_dr3_export_helper.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (catalogid = '2'::smallint))
                                                                     ->  Bitmap Index Scan on source_1_pkey  (cost=0.00..1.12 rows=1 width=0)
                                                                           Index Cond: ((sourceid = final_dr3_export_helper.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint))
                                                         ->  Bitmap Heap Scan on ts_1 ts_bp  (cost=1.13..835.67 rows=1 width=979)
                                                               Recheck Cond: ((sourceid = final_dr3_export_helper.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = '2'::smallint) AND (catalogid = '2'::smallint))
                                                               ->  Bitmap Index Scan on ts_fix_1_pkey  (cost=0.00..1.12 rows=1 width=0)
                                                                     Index Cond: ((sourceid = final_dr3_export_helper.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = '2'::smallint))
                                                   ->  Bitmap Heap Scan on ts_1 ts_g  (cost=1.13..835.67 rows=1 width=1238)
                                                         Recheck Cond: ((sourceid = final_dr3_export_helper.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = '1'::smallint) AND (catalogid = '2'::smallint))
                                                         ->  Bitmap Index Scan on ts_fix_1_pkey  (cost=0.00..1.12 rows=1 width=0)
                                                               Index Cond: ((sourceid = final_dr3_export_helper.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = '1'::smallint))
                                             ->  Bitmap Heap Scan on ts_1 ts_rp  (cost=1.13..835.67 rows=1 width=979)
                                                   Recheck Cond: ((sourceid = final_dr3_export_helper.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = '3'::smallint) AND (catalogid = '2'::smallint))
                                                   ->  Bitmap Index Scan on ts_fix_1_pkey  (cost=0.00..1.12 rows=1 width=0)
                                                         Index Cond: ((sourceid = final_dr3_export_helper.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = '3'::smallint))
                                 ->  Result  (cost=0.00..10000.53 rows=1 width=32)
                           ->  Result  (cost=0.00..10000.53 rows=1 width=32)
                     ->  Function Scan on remove_filtered bp_rp  (cost=0.63..10000.63 rows=1 width=64)
(55 rows)

When sometimes the query works (which is insert into .. select ..) we getting analyze plan like this:

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=206.60..47136823036702296.00 rows=752264783839651 width=220) (actual time=7960.565..228006.267 rows=2582 loops=1)
   ->  Remote Subquery Scan on all (datanode1,datanode10,datanode11,datanode12,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8,datanode9)  (cost=104.85..1821453743505.77 rows=514459128568 width=3537) (actual time=27.329..92.808 rows=2582 loops=1)
         ->  Nested Loop  (cost=4.85..268428275.05 rows=514459128568 width=3537)
               DN (actual startup time=0.281..7.665 total time=228.544..7840.273 rows=197..231 loops=1..1)
               Join Filter: (final_dr3_run90002_flat_environment_10.sourceid = final_dr3_export_helper.sourceid)
               ->  Nested Loop  (cost=4.74..239168098.60 rows=1 width=3529)
                     DN (actual startup time=0.224..0.390 total time=85.000..7663.202 rows=1621..1738 loops=1..1)
                     ->  Nested Loop  (cost=3.61..239157262.93 rows=1 width=2255)
                           DN (actual startup time=0.219..0.384 total time=78.440..7133.950 rows=1621..1738 loops=1..1)
                           ->  Nested Loop  (cost=2.49..239146427.26 rows=1 width=1249)
                                 DN (actual startup time=0.194..0.365 total time=70.415..3508.589 rows=1621..1738 loops=1..1)
                                 ->  Nested Loop  (cost=1.37..239135591.81 rows=1 width=1225)
                                       DN (actual startup time=0.172..0.344 total time=63.548..3499.840 rows=1621..1738 loops=1..1)
                                       ->  Merge Join  (cost=1.28..239092253.30 rows=4 width=850)
                                             DN (actual startup time=0.149..0.321 total time=56.358..3450.352 rows=1621..1738 loops=1..1)
                                             Merge Cond: (final_dr3_run90002_flat_environment_10.sourceid = tsr_rp.sourceid)
                                             ->  Nested Loop  (cost=1.20..124318276.54 rows=11471 width=459)
                                                   DN (actual startup time=0.106..0.226 total time=40.208..45.888 rows=1621..1738 loops=1..1)
                                                   ->  Merge Append  (cost=0.08..114733921.49 rows=11472 width=391)
                                                         DN (actual startup time=0.042..0.100 total time=12.556..15.118 rows=11138..11540 loops=1..1)
                                                         Sort Key: tsr_bp.sourceid
                                                         ->  Index Scan using idx_tsrbp_eecmd_90002_1 on timeseriesresult_13_1 tsr_bp  (cost=0.07..114733869.86 rows=11472 width=391)
                                                               DN (actual startup time=0.037..0.096 total time=11.735..14.289 rows=11138..11540 loops=1..1)
                                                               Index Cond: ((sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint))
                                                   ->  Bitmap Heap Scan on final_dr3_run90002_flat_environment_10  (cost=1.12..835.46 rows=1 width=68)
                                                         DN (actual startup time=0.001..0.002 total time=0.001..0.002 rows=0..0 loops=11138..11540)
                                                         Recheck Cond: ((sourceid = tsr_bp.sourceid) AND (sourceid >= '13677065656154881'::bigint))
                                                         ->  Bitmap Index Scan on pk_final_dr3_run90002_flat_environment_10  (cost=0.00..1.12 rows=1 width=0)
                                                               DN (actual startup time=0.001..0.001 total time=0.001..0.001 rows=0..0 loops=11138..11540)
                                                               Index Cond: ((sourceid = tsr_bp.sourceid) AND (sourceid >= '13677065656154881'::bigint))
                                             ->  Materialize  (cost=0.08..114763942.34 rows=11475 width=391)
                                                   DN (actual startup time=0.030..0.059 total time=14.399..3405.053 rows=11118..11544 loops=1..1)
                                                   ->  Merge Append  (cost=0.08..114763925.13 rows=11475 width=391)
                                                         DN (actual startup time=0.021..0.048 total time=12.068..3401.297 rows=11118..11544 loops=1..1)
                                                         Sort Key: tsr_rp.sourceid
                                                         ->  Index Scan using idx_tsrrp_eecmd_90002_1 on timeseriesresult_13_1 tsr_rp  (cost=0.07..114763873.49 rows=11475 width=391)
                                                               DN (actual startup time=0.021..0.048 total time=11.291..3400.384 rows=11118..11544 loops=1..1)
                                                               Index Cond: ((sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint))
                                       ->  Append  (cost=0.08..834.63 rows=1 width=391)
                                             DN (actual startup time=0.004..0.048 total time=0.004..0.048 rows=1..1 loops=1621..1738)
                                             ->  Bitmap Heap Scan on timeseriesresult_13_1 tsr_g  (cost=0.08..834.63 rows=1 width=391)
                                                   DN (actual startup time=0.003..0.046 total time=0.003..0.046 rows=1..1 loops=1621..1738)
                                                   Recheck Cond: ((sourceid = final_dr3_run90002_flat_environment_10.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (catalogid = '2'::smallint) AND (runid = 90002) AND (ftimeseriestype = 139107))
                                                   ->  Bitmap Index Scan on idx_tsrg_eecmd_90002_1  (cost=0.00..0.08 rows=1 width=0)
                                                         DN (actual startup time=0.001..0.002 total time=0.001..0.002 rows=1..1 loops=1621..1738)
                                                         Index Cond: ((sourceid = final_dr3_run90002_flat_environment_10.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (sourceid <= '20800599629677184'::bigint))
                                 ->  Append  (cost=1.12..835.46 rows=1 width=26)
                                       DN (actual startup time=0.004..0.005 total time=0.004..0.005 rows=1..1 loops=1621..1738)
                                       ->  Bitmap Heap Scan on source_1 s  (cost=1.12..835.46 rows=1 width=26)
                                             DN (actual startup time=0.002..0.003 total time=0.003..0.004 rows=1..1 loops=1621..1738)
                                             Recheck Cond: ((sourceid = final_dr3_run90002_flat_environment_10.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (catalogid = '2'::smallint))
                                             ->  Bitmap Index Scan on source_1_pkey  (cost=0.00..1.12 rows=1 width=0)
                                                   DN (actual startup time=0.001..0.002 total time=0.001..0.002 rows=1..1 loops=1621..1738)
                                                   Index Cond: ((sourceid = final_dr3_run90002_flat_environment_10.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint))
                           ->  Append  (cost=1.13..835.67 rows=1 width=1008)
                                 DN (actual startup time=0.004..2.117 total time=0.004..2.117 rows=1..1 loops=1621..1738)
                                 ->  Bitmap Heap Scan on ts_1 ts_bp  (cost=1.13..835.67 rows=1 width=1008)
                                       DN (actual startup time=0.003..2.116 total time=0.003..2.116 rows=1..1 loops=1621..1738)
                                       Recheck Cond: ((sourceid = final_dr3_run90002_flat_environment_10.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = '2'::smallint) AND (catalogid = '2'::smallint))
                                       ->  Bitmap Index Scan on ts_fix_1_pkey  (cost=0.00..1.13 rows=1 width=0)
                                             DN (actual startup time=0.002..0.005 total time=0.002..0.005 rows=1..1 loops=1621..1738)
                                             Index Cond: ((sourceid = final_dr3_run90002_flat_environment_10.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = '2'::smallint))
                     ->  Append  (cost=1.13..835.67 rows=1 width=1276)
                           DN (actual startup time=0.003..0.308 total time=0.004..0.308 rows=1..1 loops=1621..1738)
                           ->  Bitmap Heap Scan on ts_1 ts_g  (cost=1.13..835.67 rows=1 width=1276)
                                 DN (actual startup time=0.002..0.307 total time=0.002..0.308 rows=1..1 loops=1621..1738)
                                 Recheck Cond: ((sourceid = final_dr3_run90002_flat_environment_10.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = '1'::smallint) AND (catalogid = '2'::smallint))
                                 ->  Bitmap Index Scan on ts_fix_1_pkey  (cost=0.00..1.13 rows=1 width=0)
                                       DN (actual startup time=0.001..0.002 total time=0.001..0.002 rows=1..1 loops=1621..1738)
                                       Index Cond: ((sourceid = final_dr3_run90002_flat_environment_10.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = '1'::smallint))
               ->  Index Scan using uq_final_dr3_export_helper on final_dr3_export_helper  (cost=0.11..2090176.13 rows=209 width=125)
                     DN (actual startup time=0.005..0.007 total time=0.074..0.103 rows=197..231 loops=1621..1738)
                     Index Cond: ((sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint))
   ->  Materialize  (cost=101.75..80938.36 rows=1 width=74) (actual time=87.905..87.969 rows=1 loops=2582)
         ->  Nested Loop  (cost=101.75..80938.36 rows=1 width=74) (actual time=87.905..87.969 rows=1 loops=2582)
               ->  Result  (cost=0.00..10000.53 rows=1 width=32) (actual time=3.664..3.664 rows=1 loops=2582)
               ->  Materialize  (cost=101.75..50937.84 rows=1 width=74) (actual time=84.239..84.302 rows=1 loops=2582)
                     ->  Nested Loop  (cost=101.75..50937.83 rows=1 width=74) (actual time=84.238..84.302 rows=1 loops=2582)
                           ->  Remote Subquery Scan on all (datanode1,datanode10,datanode11,datanode12,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8,datanode9)  (cost=101.13..936.68 rows=1 width=1008) (actual time=3.697..3.759 rows=1 loops=2582)
                                 ->  Append  (cost=1.13..835.67 rows=1 width=1008)
                                       DN (actual startup time=0.016..0.044 total time=0.016..0.045 rows=1..1 loops=1..1)
                                       ->  Bitmap Heap Scan on ts_1 ts_rp  (cost=1.13..835.67 rows=1 width=1008)
                                             DN (actual startup time=0.016..0.043 total time=0.016..0.044 rows=1..1 loops=1..1)
                                             Recheck Cond: ((sourceid = sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = '3'::smallint) AND (catalogid = '2'::smallint))
                                             ->  Bitmap Index Scan on ts_fix_1_pkey  (cost=0.00..1.13 rows=1 width=0)
  DN (actual startup time=0.002..0.005 total time=0.002..0.005 rows=1..1 loops=1621..1738)
                                             Index Cond: ((sourceid = final_dr3_run90002_flat_environment_10.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = '2'::smallint))
                     ->  Append  (cost=1.13..835.67 rows=1 width=1276)
                           DN (actual startup time=0.003..0.308 total time=0.004..0.308 rows=1..1 loops=1621..1738)
                           ->  Bitmap Heap Scan on ts_1 ts_g  (cost=1.13..835.67 rows=1 width=1276)
                                 DN (actual startup time=0.002..0.307 total time=0.002..0.308 rows=1..1 loops=1621..1738)
                                 Recheck Cond: ((sourceid = final_dr3_run90002_flat_environment_10.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = '1'::smallint) AND (catalogid = '2'::smallint))
                                 ->  Bitmap Index Scan on ts_fix_1_pkey  (cost=0.00..1.13 rows=1 width=0)
                                       DN (actual startup time=0.001..0.002 total time=0.001..0.002 rows=1..1 loops=1621..1738)
                                       Index Cond: ((sourceid = final_dr3_run90002_flat_environment_10.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = '1'::smallint))
               ->  Index Scan using uq_final_dr3_export_helper on final_dr3_export_helper  (cost=0.11..2090176.13 rows=209 width=125)
                     DN (actual startup time=0.005..0.007 total time=0.074..0.103 rows=197..231 loops=1621..1738)
                     Index Cond: ((sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint))
   ->  Materialize  (cost=101.75..80938.36 rows=1 width=74) (actual time=87.905..87.969 rows=1 loops=2582)
         ->  Nested Loop  (cost=101.75..80938.36 rows=1 width=74) (actual time=87.905..87.969 rows=1 loops=2582)
               ->  Result  (cost=0.00..10000.53 rows=1 width=32) (actual time=3.664..3.664 rows=1 loops=2582)
               ->  Materialize  (cost=101.75..50937.84 rows=1 width=74) (actual time=84.239..84.302 rows=1 loops=2582)
                     ->  Nested Loop  (cost=101.75..50937.83 rows=1 width=74) (actual time=84.238..84.302 rows=1 loops=2582)
                           ->  Remote Subquery Scan on all (datanode1,datanode10,datanode11,datanode12,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8,datanode9)  (cost=101.13..936.68 rows=1 width=1008) (actual time=3.697..3.759 rows=1 loops=2582)
                                 ->  Append  (cost=1.13..835.67 rows=1 width=1008)
                                       DN (actual startup time=0.016..0.044 total time=0.016..0.045 rows=1..1 loops=1..1)
                                       ->  Bitmap Heap Scan on ts_1 ts_rp  (cost=1.13..835.67 rows=1 width=1008)
                                             DN (actual startup time=0.016..0.043 total time=0.016..0.044 rows=1..1 loops=1..1)
                                             Recheck Cond: ((sourceid = sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = '3'::smallint) AND (catalogid = '2'::smallint))
                                             ->  Bitmap Index Scan on ts_fix_1_pkey  (cost=0.00..1.13 rows=1 width=0)
                                                   DN (actual startup time=0.011..0.032 total time=0.011..0.032 rows=1..1 loops=1..1)
                                                   Index Cond: ((sourceid = sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = '3'::smallint))
                           ->  Nested Loop  (cost=0.63..40001.15 rows=1 width=64) (actual time=80.534..80.535 rows=1 loops=2582)
                                 ->  Result  (cost=0.00..10000.53 rows=1 width=32) (actual time=3.555..3.556 rows=1 loops=2582)
                                 ->  Function Scan on remove_filtered bp_rp  (cost=0.63..10000.63 rows=1 width=64) (actual time=76.976..76.977 rows=1 loops=2582)
 Planning time: 399.474 ms
 Execution time: 228189.382 ms
(92 rows)

Normally such query could be pushed down,but as you see it does not happen and the error is a quasi-blocker for us. Let us know please if you could think of a workaround - otherwise we probably go back to pre v2.2 code.

yazun commented 2 years ago

Confirmed: It only happens when used in insert into ... select ... context.

JennyJennyChen commented 2 years ago

Thank you for your feedback, we will assign a developer to deal with this problem, and we will solve it for you as soon as possible

yazun commented 2 years ago

Much appreciated!

LinChuangwei commented 2 years ago

@yazun ,hello,enum NodeType 113 is mapped to T_GatherMergeState, maybe you used parallel mode, Can you see if max_parallel_workers_per_gather is 0?

yazun commented 2 years ago

Hello @LinChuangwei, max_parallel_workers_per_gather is globally set to 20 indeed, but the plan does not show parallel execution, so not sure it should be a problem here?

LinChuangwei commented 2 years ago

It may be that the plan changes at some time,you can try to set max_parallel_workers_per_gather to 0. In addition, can you help set log_error_verbosity to verbose and see the specific print location from the pg_log when an error occurs? thank you very much

yazun commented 2 years ago

Yes, indeed you were right, setting max_parallel_workers_per_gather to 0 helped.

explain analyze
insert into dr3_ops_cs36_mv.final_dr3_animation_variables_exported_1
select sourceid,
        sostypes,classification_types,
        ...

----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Remote Subquery Scan on all (datanode1,datanode10,datanode11,datanode12,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8,datanode9)  (cost=20479222.53..3368986147837810.00 rows=48044735794174 width=283) (actual time=18931.348..18931.348 rows=0 loops=1)
   ->  Insert on final_dr3_animation_variables_exported_1  (cost=20479222.53..3368986147837810.00 rows=48044735794174 width=283)
         DN (actual startup time=18428.574..18484.696 total time=18428.574..18484.696 rows=0..0 loops=1..1)
         ->  Remote Subquery Scan on all (datanode11)  (cost=20479222.53..3368986147837810.00 rows=48044735794174 width=283)
               DN (actual startup time=1136.135..16936.424 total time=18413.332..18460.620 rows=197..231 loops=1..1)
               Distribute results by S: sourceid
               ->  Nested Loop  (cost=20479122.53..3285820710177995.00 rows=48044735794174 width=283)
                     DN (actual startup time=327.351..327.351 total time=17723.463..17723.463 rows=2582..2582 loops=1..1)
                     ->  Remote Subquery Scan on all (datanode1,datanode10,datanode11,datanode12,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8,datanode9)  (cost=20479020.78..117892842332.74 rows=35884647727 width=3281)
                           DN (actual startup time=12.791..12.791 total time=24.792..24.792 rows=2582..2582 loops=1..1)
                           ->  Hash Join  (cost=20478920.78..47659097.27 rows=35884647727 width=3281)
                                 DN (actual startup time=7.841..7.841 total time=8.152..8.152 rows=231..231 loops=1..1)
                                 Hash Cond: (final_dr3_export_helper.sourceid = final_dr3_run90002_flat_environment_10.sourceid)
                                 ->  Index Scan using uq_final_dr3_export_helper on final_dr3_export_helper  (cost=0.11..2090176.13 rows=209 width=128)
                                       DN (actual startup time=0.122..0.122 total time=0.291..0.291 rows=231..231 loops=1..1)
                                       Index Cond: ((sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint))
                                 ->  Hash  (cost=20468920.67..20468920.67 rows=1 width=3206)
                                       DN (actual startup time=7.653..7.653 total time=7.653..7.653 rows=231..231 loops=1..1)
                                       Buckets: 1024  Batches: 1  Memory Usage: 501kB
                                       ->  Nested Loop  (cost=5.88..20468920.67 rows=1 width=3206)
                                             DN (actual startup time=0.154..0.154 total time=6.971..6.971 rows=231..231 loops=1..1)
                                             ->  Nested Loop  (cost=4.75..20458085.00 rows=1 width=1970)
                                                   DN (actual startup time=0.147..0.147 total time=5.985..5.985 rows=231..231 loops=1..1)
                                                   ->  Nested Loop  (cost=3.63..20447249.33 rows=1 width=993)
                                                         DN (actual startup time=0.122..0.122 total time=4.558..4.558 rows=231..231 loops=1..1)
                                                         ->  Nested Loop  (cost=2.51..20436413.88 rows=1 width=969)
                                                               DN (actual startup time=0.102..0.102 total time=3.310..3.310 rows=231..231 loops=1..1)
                                                               ->  Merge Join  (cost=1.37..20425578.19 rows=1 width=678)
                                                                     DN (actual startup time=0.091..0.091 total time=2.125..2.125 rows=231..231 loops=1..1)
                                                                     Merge Cond: (tsr_rp.sourceid = final_dr3_run90002_flat_environment_10.sourceid)
                                                                     ->  Merge Append  (cost=0.12..9861340.98 rows=985 width=307)
                                                                           DN (actual startup time=0.029..0.029 total time=0.369..0.369 rows=231..231 loops=1..1)
                                                                           Sort Key: tsr_rp.sourceid
                                                                           ->  Index Scan using timeseriesresult_1_1_pkey on timeseriesresult_1_1 tsr_rp  (cost=0.12..9861336.54 rows=985 width=307)
                                                                                 DN (actual startup time=0.025..0.025 total time=0.345..0.345 rows=231..231 loops=1..1)
                                                                                 Index Cond: ((runid = 3852) AND (catalogid = '2'::smallint) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = 336928))
                                                                     ->  Materialize  (cost=1.25..10554234.28 rows=973 width=371)
                                                                           DN (actual startup time=0.054..0.054 total time=1.594..1.594 rows=231..231 loops=1..1)
                                                                           ->  Nested Loop  (cost=1.25..10554232.82 rows=973 width=371)
                                                                                 DN (actual startup time=0.045..0.045 total time=1.504..1.504 rows=231..231 loops=1..1)
                                                                                 ->  Merge Append  (cost=0.12..9741333.85 rows=973 width=307)
                                                                                       DN (actual startup time=0.005..0.005 total time=0.413..0.413 rows=231..231 loops=1..1)
                                                                                       Sort Key: tsr_bp.sourceid
                                                                                       ->  Index Scan using timeseriesresult_1_1_pkey on timeseriesresult_1_1 tsr_bp  (cost=0.12..9741329.46 rows=973 width=307)
                                                                                             DN (actual startup time=0.005..0.005 total time=0.385..0.385 rows=231..231 loops=1..1)
                                                                                             Index Cond: ((runid = 3852) AND (catalogid = '2'::smallint) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = 336911))
                                                                                 ->  Bitmap Heap Scan on final_dr3_run90002_flat_environment_10  (cost=1.12..835.46 rows=1 width=64)
                                                                                       DN (actual startup time=0.003..0.003 total time=0.003..0.003 rows=1..1 loops=231..231)
                                                                                       Recheck Cond: ((sourceid = tsr_bp.sourceid) AND (sourceid >= '13677065656154881'::bigint))
                                                                                       ->  Bitmap Index Scan on pk_final_dr3_run90002_flat_environment_10  (cost=0.00..1.12 rows=1 width=0)
                                                                                             DN (actual startup time=0.002..0.002 total time=0.002..0.002 rows=1..1 loops=231..231)
                                                                                             Index Cond: ((sourceid = tsr_bp.sourceid) AND (sourceid >= '13677065656154881'::bigint))
                                                               ->  Append  (cost=1.14..835.68 rows=1 width=307)
                                                                     DN (actual startup time=0.005..0.005 total time=0.005..0.005 rows=1..1 loops=231..231)
                                                                     ->  Bitmap Heap Scan on timeseriesresult_1_1 tsr_g  (cost=1.14..835.68 rows=1 width=307)
                                                                           DN (actual startup time=0.003..0.003 total time=0.004..0.004 rows=1..1 loops=231..231)
                                                                           Recheck Cond: ((runid = 3852) AND (catalogid = '2'::smallint) AND (sourceid = final_dr3_run90002_flat_environment_10.sourceid) AND (sourceid <= '20800599629677184'::bigint) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = 336935))
                                                                           ->  Bitmap Index Scan on timeseriesresult_1_1_pkey  (cost=0.00..1.14 rows=1 width=0)
                                                                                 DN (actual startup time=0.002..0.002 total time=0.002..0.002 rows=1..1 loops=231..231)
                                                                                 Index Cond: ((runid = 3852) AND (catalogid = '2'::smallint) AND (sourceid = final_dr3_run90002_flat_environment_10.sourceid) AND (sourceid <= '20800599629677184'::bigint) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = 336935))
                                                         ->  Append  (cost=1.12..835.45 rows=1 width=26)
                                                               DN (actual startup time=0.005..0.005 total time=0.005..0.005 rows=1..1 loops=231..231)
                                                               ->  Bitmap Heap Scan on source_1 s  (cost=1.12..835.45 rows=1 width=26)
                                                                     DN (actual startup time=0.004..0.004 total time=0.004..0.004 rows=1..1 loops=231..231)
                                                                     Recheck Cond: ((sourceid = final_dr3_run90002_flat_environment_10.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (catalogid = '2'::smallint))
                                                                     ->  Bitmap Index Scan on source_1_pkey  (cost=0.00..1.12 rows=1 width=0)
                                                                           DN (actual startup time=0.002..0.002 total time=0.002..0.002 rows=1..1 loops=231..231)
                                                                           Index Cond: ((sourceid = final_dr3_run90002_flat_environment_10.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint))
                                                   ->  Append  (cost=1.12..835.67 rows=1 width=979)
                                                         DN (actual startup time=0.006..0.006 total time=0.006..0.006 rows=1..1 loops=231..231)
                                                         ->  Bitmap Heap Scan on ts_1 ts_bp  (cost=1.12..835.67 rows=1 width=979)
                                                               DN (actual startup time=0.005..0.005 total time=0.005..0.005 rows=1..1 loops=231..231)
                                                               Recheck Cond: ((sourceid = final_dr3_run90002_flat_environment_10.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = '2'::smallint) AND (catalogid = '2'::smallint))
                                                               ->  Bitmap Index Scan on ts_fix_1_pkey  (cost=0.00..1.12 rows=1 width=0)
                                                                     DN (actual startup time=0.003..0.003 total time=0.003..0.003 rows=1..1 loops=231..231)
                                                                     Index Cond: ((sourceid = final_dr3_run90002_flat_environment_10.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = '2'::smallint))
                                             ->  Append  (cost=1.12..835.67 rows=1 width=1238)
                                                   DN (actual startup time=0.004..0.004 total time=0.004..0.004 rows=1..1 loops=231..231)
                                                   ->  Bitmap Heap Scan on ts_1 ts_g  (cost=1.12..835.67 rows=1 width=1238)
                                                         DN (actual startup time=0.003..0.003 total time=0.003..0.003 rows=1..1 loops=231..231)
                                                         Recheck Cond: ((sourceid = final_dr3_run90002_flat_environment_10.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = '1'::smallint) AND (catalogid = '2'::smallint))
                                                         ->  Bitmap Index Scan on ts_fix_1_pkey  (cost=0.00..1.12 rows=1 width=0)
                                                               DN (actual startup time=0.001..0.001 total time=0.001..0.001 rows=1..1 loops=231..231)
                                                               Index Cond: ((sourceid = final_dr3_run90002_flat_environment_10.sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = '1'::smallint))
                     ->  Materialize  (cost=101.75..80938.33 rows=1 width=74)
                           DN (actual startup time=6.495..6.495 total time=6.588..6.588 rows=1..1 loops=2582..2582)
                           ->  Nested Loop  (cost=101.75..80938.33 rows=1 width=74)
                                 DN (actual startup time=6.495..6.495 total time=6.587..6.587 rows=1..1 loops=2582..2582)
                                 ->  Result  (cost=0.00..10000.53 rows=1 width=32)
                                       DN (actual startup time=0.396..0.396 total time=0.397..0.397 rows=1..1 loops=2582..2582)
                                 ->  Materialize  (cost=101.75..50937.81 rows=1 width=74)
                                       DN (actual startup time=6.097..6.097 total time=6.189..6.189 rows=1..1 loops=2582..2582)
                                       ->  Nested Loop  (cost=101.75..50937.80 rows=1 width=74)
                                             DN (actual startup time=6.097..6.097 total time=6.189..6.189 rows=1..1 loops=2582..2582)
                                             ->  Remote Subquery Scan on all (datanode1,datanode10,datanode11,datanode12,datanode2,datanode3,datanode4,datanode5,datanode6,datanode7,datanode8,datanode9)  (cost=101.12..936.65 rows=1 width=979)
                                                   DN (actual startup time=0.957..0.957 total time=1.048..1.048 rows=1..1 loops=2582..2582)
                                                   ->  Append  (cost=1.12..835.67 rows=1 width=979)
                                                         DN (actual startup time=33.615..33.615 total time=33.615..33.615 rows=1..1 loops=1..1)
                                                         ->  Bitmap Heap Scan on ts_1 ts_rp  (cost=1.12..835.67 rows=1 width=979)
                                                               DN (actual startup time=33.615..33.615 total time=33.615..33.615 rows=1..1 loops=1..1)
                                                               Recheck Cond: ((sourceid = sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = '3'::smallint) AND (catalogid = '2'::smallint))
                                                               ->  Bitmap Index Scan on ts_fix_1_pkey  (cost=0.00..1.12 rows=1 width=0)
                                                                     DN (actual startup time=2.111..2.111 total time=2.111..2.111 rows=1..1 loops=1..1)
                                                                     Index Cond: ((sourceid = sourceid) AND (sourceid >= '13677065656154881'::bigint) AND (sourceid <= '20800599629677184'::bigint) AND (ftimeseriestype = '3'::smallint))
                                             ->  Nested Loop  (cost=0.63..40001.15 rows=1 width=64)
                                                   DN (actual startup time=5.132..5.132 total time=5.133..5.133 rows=1..1 loops=2582..2582)
                                                   ->  Result  (cost=0.00..10000.53 rows=1 width=32)
                                                         DN (actual startup time=0.358..0.358 total time=0.358..0.358 rows=1..1 loops=2582..2582)
                                                   ->  Function Scan on remove_filtered bp_rp  (cost=0.63..10000.63 rows=1 width=64)
                                                         DN (actual startup time=4.772..4.772 total time=4.773..4.773 rows=1..1 loops=2582..2582)
 Planning time: 475.338 ms
 Execution time: 19104.729 ms
(112 rows)
LinChuangwei commented 2 years ago

Can you provide a complete SQL? I'll try to reproduce this error,thanks

yazun commented 2 years ago

@LinChuangwei sql is quite convoluted as uses few functions and unfortunately I cannot make the dump with the data, but here goes the query:


set prefer_olap=true;
set work_mem='8GB';
set cpu_tuple_cost=10000;
set pgxl_remote_fetch_size=25000;
set max_parallel_workers_per_gather to 20;
explain analyze
insert into dr3_ops_cs36_mv.final_dr3_animation_variables_exported_1
select sourceid,
        sostypes,classification_types,
        (g).obstimes g_obstime,(g).val g_vals, cardinality((g).val) glength,
        (bp_rp).obstimes bprp_obstime,(bp_rp).val bprp_vals, cardinality((bp_rp).val) bprp_length,

        alpha,delta,
        varpierror_mas,
        fmedian_g,fmedian_bp,fmedian_rp,fmean_g,fmean_bp,fmean_rp
        from (
 select
            sourceid,
            s.alpha,s.delta,
                    varpierror_mas,
        fmedian_g,fmedian_bp,fmedian_rp,fmean_g,fmean_bp,fmean_rp,
        sostypes,classification_types,

            bp_rp
            ,
           (
        cleanTimeseriesFluxToMag(
              ts_g.obstimes
             ,ts_g.transitids
             ,ts_g.vals
             ,ts_g.valserr
             ,ts_g.flags
          ,(tsr_g.derivedtschanges->(gettstypeid('Gaia', 'GAIA_PHOT_G')::text))::int[]
          ,getzeropoint('Gaia', 'GAIA_PHOT_G')
         )
         ) g
        from
            dr3_ops_cs36_mv.v_final_dr3_export_helper h
            join  dr3_ops_cs36_mv.final_dr3_run90002_flat_environment v using(Sourceid)
            join  timeseriesresult tsr_g using (sourceid)
            join  timeseriesresult tsr_bp using(runid,catalogid,sourceid)
            join  timeseriesresult tsr_rp using(runid,catalogid,sourceid)
            join source s using (catalogid,sourceid)
            join  ts as ts_bp using (catalogid,sourceid)
            join  ts as ts_g using (catalogid,sourceid)
            join  ts as ts_rp using (catalogid,sourceid)
            join lateral (
                    select  gaiafluxtomagnitudecomposite ('Gaia',  gettstypename(ts_bp.ftimeseriestype),ts_bp.obstimes::float8[],ts_bp.vals::float8[],ts_bp.valserr::float8[],false ) V
            ) bp on true

            join lateral (
                    select  gaiafluxtomagnitudecomposite ('Gaia',  gettstypename(ts_rp.ftimeseriestype),ts_rp.obstimes::float8[],ts_rp.vals::float8[],ts_rp.valserr::float8[],false ) v
            ) rp on true
            join lateral (select
                array_cat(array_cat(
                (tsr_g.derivedtschanges->((getInputTsType(tsr_g.ftimeseriestype)::text)))::int[],
                (tsr_bp.derivedtschanges->((getInputTsType(tsr_bp.ftimeseriestype)::text)))::int[])  ,
                (tsr_rp.derivedtschanges->((getInputTsType(tsr_rp.ftimeseriestype)::text)))::int[]) changes) changes on true
            join lateral
            remove_filtered(
                ((bp.v).obstimes::float8[],(bp.v).vals::float8[],(bp.v).valserr::float8[],ts_bp.flags::int[])::ts_bucket_p,
                ((rp.v).obstimes::float8[],(rp.v).vals::float8[],(rp.v).valserr::float8[],ts_rp.flags::int[])::ts_bucket_p,
                changes
                ) bp_rp on true

            where
            catalogid = getmaincatalog()
            and runid = 3852

            and tsr_g.ftimeseriestype  = gettsresulttypeid(3852,'ExtremeErrorCleaningMagnitudeDependent_FOV_G')
            and tsr_bp.ftimeseriestype  = gettsresulttypeid(3852,'ExtremeErrorCleaningMagnitudeDependent_FOV_BP')
            and tsr_rp.ftimeseriestype  = gettsresulttypeid(3852,'ExtremeErrorCleaningMagnitudeDependent_FOV_RP')

            and ts_g.ftimeseriestype  = gettstypeid('Gaia','GAIA_PHOT_G')
            and ts_bp.ftimeseriestype  = gettstypeid('Gaia','GAIA_PHOT_BP')
            and ts_rp.ftimeseriestype  = gettstypeid('Gaia','GAIA_PHOT_RP')

            and h.sourceid >= 13677065656154881 AND h.sourceid <= 20800599629677184
            and v.sourceid >= 13677065656154881 AND tsr_g.sourceid <= 20800599629677184
            and tsr_g.sourceid >= 13677065656154881 AND tsr_g.sourceid <= 20800599629677184
            and tsr_bp.sourceid >= 13677065656154881 AND tsr_bp.sourceid <= 20800599629677184
            and tsr_rp.sourceid >= 13677065656154881 AND tsr_rp.sourceid <= 20800599629677184
            and s.sourceid >= 13677065656154881 AND s.sourceid <= 20800599629677184
            and ts_g.sourceid >= 13677065656154881 AND ts_g.sourceid <= 20800599629677184
            and ts_bp.sourceid >= 13677065656154881 AND ts_bp.sourceid <= 20800599629677184
            and ts_rp.sourceid >= 13677065656154881 AND ts_rp.sourceid <= 20800599629677184

        ) fuller

gives

ERROR:  node:datanode12, backend_pid:11466, nodename:datanode12,backend_pid:11466,message:node:datanode7, backend_pid:1468, nodename:datanode7,backend_pid:1468,message:Failed to read from SQueue, CONSUMER_ERROR set, err_msg "unrecognized node type: 113",,

source, ts, timeseriesresult tables are partitioned and sharded by sourceid.

The last range part of the query is just to make sure we deal with single partitions all along as this query is executed in sourceid range batches on a relatively large dataset.

All functions are marked parallel safe and immutable, all are sql ones except cleanTimeseriesFluxToMag which is pljava.

LinChuangwei commented 2 years ago

can you help set log_error_verbosity to verbose and see the specific print location from the pg_log when an error occurs?

yazun commented 2 years ago

xz-log-insert_113_failure.csv Kibana log from around the event attached. The failing sql is from one of the pgsql functions.

2022-01-12 13:32:38 CET [1427,coord(0.0)]:xid[0-532/575] [3-1] user=,db=,client=,query=,FATAL: 57P01: terminating connection due to administrator command#0122022-01-12 13:32:38 CET [1427,coord(0.0)]:xid[0-532/575] [4-1] user=,db=,client=,query=,LOCATION: ProcessInterrupts, postgres.c:3956#0122022-01-12 13:32:38 CET [1427,coord(0.0)]:xid[0-532/575] [5-1] user=,db=,client=,query=,STATEMENT: #012#011with recursive parents(tstype_parent) as#012#011(#012#011 select tstype_parent from tstyperesult_tstyperesult where tstype_me = intstype#012#011 union all#012#011 select ti.tstype_parent from tstyperesult_tstyperesult ti,parents p where ti.tstype_me = p.tstype_parent#012#011#012#011 )#012#011select tstype_input from parents join tstype_tstyperesult on (tstype_parent = tstype_output)#012#011 join timeseriestype tst on (tstype_input = tstype)#012#011 join timeseriesresulttype tsrt on (tsrt.bandpass = tst.fbandpass and tsrt.tstype = intstype)#012#011;
yazun commented 2 years ago

So maybe the failure is related to CTE as there have been changes recently:

REATE OR REPLACE FUNCTION dr3_ops_cs36.getinputtstype(intstype integer)
 RETURNS integer
 LANGUAGE sql
 IMMUTABLE
AS $function$
with recursive parents(tstype_parent) as
(
    select tstype_parent from tstyperesult_tstyperesult  where tstype_me = intstype
    union all
    select ti.tstype_parent from tstyperesult_tstyperesult ti,parents p where ti.tstype_me = p.tstype_parent

    )
select tstype_input from parents  join tstype_tstyperesult on (tstype_parent = tstype_output)
 join timeseriestype tst on (tstype_input = tstype)
 join timeseriesresulttype tsrt on (tsrt.bandpass = tst.fbandpass and tsrt.tstype = intstype)
;
$function$
yazun commented 2 years ago

(plus I am noticing now that it is not parallel safe strangely)

yazun commented 2 years ago

but making it parallel safe yields the same error.

LinChuangwei commented 2 years ago

Thank you for your reply, we will continue to analyze the problem

yazun commented 1 month ago

@runewrz Was the error identified? Could you point to the commit fixing it?

runewrz commented 1 month ago

@runewrz Was the error identified? Could you point to the commit fixing it?

Thanks for your reply, we will continue to follow up on the issue.