citusdata / citus

Distributed PostgreSQL as an extension
https://www.citusdata.com
GNU Affero General Public License v3.0
10.53k stars 667 forks source link

Server crashes when trying to execute SELECT unnest(shard_placement_rebalance_array(...); #7553

Open saygoodbyye opened 7 months ago

saygoodbyye commented 7 months ago

Hello! I have got a server crash when executing the following SQL-script (test.sql). I have postgres on REL_16_STABLE branch and citus on main branch. Postgres build:

CFLAGS="-Og" ./configure \
        --enable-cassert \
        --enable-tap-tests \
        --enable-debug \
        --with-icu \
        --with-lz4 \
        --with-libxml \
        --with-openssl \
        --prefix=$DATA \
        --quiet

Citus build:

PG_CONFIG=$PG_CONFIG ./configure --without-lz4 --without-zstd

postgresql.conf:

shared_preload_libraries='citus'

test.sql:

CREATE OR REPLACE FUNCTION shard_placement_rebalance_array(
    worker_node_list json[],
    shard_placement_list json[],
    threshold float4 DEFAULT 0,
    max_shard_moves int DEFAULT 1000000,
    drain_only bool DEFAULT false,
    improvement_threshold float4 DEFAULT 0.5
)
RETURNS json[]
AS 'citus'
LANGUAGE C STRICT VOLATILE;

SELECT unnest(shard_placement_rebalance_array(
    ARRAY['{"node_name": "hostname1"}',
          '{"node_name": "hostname2", "capacity": 3}']::json[],
    ARRAY['{"hostname1":1, "nodename":"hostname1"}',
          '{"shardid":2, "nodename":"node_name"}',
          '{"shardid":3, "nodename":"hostname1", "cost": 2}']::json[]
));

backtrace:

#0  InitRebalanceState (functions=0x7ffcb1d2b5d0, shardPlacementList=<optimized out>, workerNodeList=0x55aacd6b1f70) at operations/shard_rebalancer.c:2567
#1  RebalancePlacementUpdates (workerNodeList=0x55aacd6b1f70, activeShardPlacementListList=activeShardPlacementListList@entry=0x55aacd6b1f10, threshold=0, maxShardMoves=maxShardMoves@entry=1000000, drainOnly=drainOnly@entry=false, 
    improvementThreshold=improvementThreshold@entry=0.5, functions=functions@entry=0x7ffcb1d2b5d0) at operations/shard_rebalancer.c:2433
#2  0x00007f5bdc2787bf in shard_placement_rebalance_array (fcinfo=<optimized out>) at test/shard_rebalancer.c:176
#3  0x000055aacc0214e0 in ExecInterpExpr (state=0x55aacd69f790, econtext=0x55aacd69ed70, isnull=<optimized out>) at execExprInterp.c:758
#4  0x000055aacc02caa4 in ExecEvalExpr (isNull=0x55aacd69fee0, econtext=0x55aacd69ed70, state=<optimized out>) at ../../../src/include/executor/executor.h:336
#5  ExecEvalFuncArgs (fcinfo=fcinfo@entry=0x55aacd69feb8, argList=0x55aacd69fe70, econtext=econtext@entry=0x55aacd69ed70) at execSRF.c:847
#6  0x000055aacc02d736 in ExecMakeFunctionResultSet (fcache=0x55aacd69f708, econtext=econtext@entry=0x55aacd69ed70, argContext=0x55aacd6aa990, isNull=0x55aacd69f6b0, isDone=isDone@entry=0x55aacd69f6f8) at execSRF.c:577
#7  0x000055aacc052198 in ExecProjectSRF (node=node@entry=0x55aacd69ec68, continuing=continuing@entry=false) at nodeProjectSet.c:183
#8  0x000055aacc05223c in ExecProjectSet (pstate=0x55aacd69ec68) at nodeProjectSet.c:107
#9  0x000055aacc024c22 in ExecProcNode (node=0x55aacd69ec68) at ../../../src/include/executor/executor.h:273
#10 ExecutePlan (execute_once=<optimized out>, dest=0x55aacd6a40f8, direction=-848693592, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x55aacd69ec68, estate=0x55aacd69ea50)
    at execMain.c:1670
#11 standard_ExecutorRun (queryDesc=queryDesc@entry=0x55aacd4c5590, direction=direction@entry=ForwardScanDirection, count=count@entry=0, execute_once=execute_once@entry=true) at execMain.c:365
#12 0x00007f5bdc218ed6 in CitusExecutorRun (queryDesc=0x55aacd4c5590, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at executor/multi_executor.c:238
#13 0x000055aacc1c0cff in PortalRunSelect (portal=0x55aacd626ff0, forward=<optimized out>, count=0, dest=<optimized out>) at pquery.c:924
#14 0x000055aacc1c20e3 in PortalRun (portal=portal@entry=0x55aacd626ff0, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x55aacd6a40f8, altdest=altdest@entry=0x55aacd6a40f8, 
    qc=0x7ffcb1d2bbe0) at pquery.c:768
#15 0x000055aacc1be5cd in exec_simple_query (
    query_string=0x55aacd5567a0 "SELECT unnest(shard_placement_rebalance_array(\n    ARRAY['{\"node_name\": \"hostname1\"}',\n          '{\"node_name\": \"hostname2\", \"capacity\": 3}']::json[],\n    ARRAY['{\"hostname1\":1, \"nodename\":\"hostname1\""...) at postgres.c:1274
#16 0x000055aacc1c0707 in PostgresMain (dbname=<optimized out>, username=<optimized out>) at postgres.c:4637
#17 0x000055aacc13af8f in BackendRun (port=0x55aacd5d8b00, port=0x55aacd5d8b00) at postmaster.c:4464
#18 BackendStartup (port=0x55aacd5d8b00) at postmaster.c:4192
#19 ServerLoop () at postmaster.c:1782
#20 0x000055aacc13bf95 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x55aacd4bdfd0) at postmaster.c:1466
#21 0x000055aacbe8fb91 in main (argc=3, argv=0x55aacd4bdfd0) at main.c:198

Best regards, Egor Chindyaskin Postgres Professional: http://postgrespro.com/

JelteF commented 7 months ago

Just like #7551 I don't consider this a problematic crash. It's again a function that's only supposed to be used in our tests. Feel free to submit a PR to fix it though.