Open steven-kirk opened 4 years ago
After this customer switched from using HLLs of size "hll_empty(17,5,-1,1)", which may be up to 80Kb in memory size) to using HLLs of the default size "hll_empty(11,5,-1,1)" where the max memory size is 1.2 Kb and thus smaller than an OS memory page, then they stopped seeing frequent crashes.
We’ve been seeing nearly daily crashes from a PostgreSQL 9.6 application that is heavily dependent on the HLL extension (v 2.10.2). All these crashes are from inside the HLL bitstream_unpack function. Usually they’re from an INSERT VALUES statement, but occasionally they are from an hll_cardinality call in a query. The format of the HLL cells is always set using expressions like: hll_empty(17,5,-1,1) || hll_hash_bigint(879826435) || ...
I think I’ve identified the root cause, but I’d like someone who is familiar with the code in the HLL library to confirm my hypothesis:
Assuming I’ve correctly identified the problem, the attached patch should fix the issue.
Regards, Steve Kirk
hll_2_10_2_patch.txt
BTW, The crash stacks all look roughly like:
0 bitstream_unpack (brcp=brcp@entry=0x7ffc69814290) at hll.c:263
1 0x0000147cbfcbe8ac in sparse_unpack (i_size=, i_bitp=0x7ffc69814327 "", i_nfilled=10403, i_log2nregs=, i_width=5, i_regp=0x7ffc69814320 "\002") at hll.c:359
2 multiset_unpack (o_msp=o_msp@entry=0x7ffc698142f0, i_bitp=i_bitp@entry=0x147a16bf903c "\023\221\177", i_size=, o_encoded_type=o_encoded_type@entry=0x0) at hll.c:1188
3 0x0000147cbfcc0823 in hll_union (fcinfo=) at hll.c:2042
4 0x0000000000617132 in ExecMakeFunctionResultNoSets (fcache=0x147cbb3eb948, econtext=0x147a1d6cd530, isNull=0x147a1d6ce09d "", isDone=) at execQual.c:2041
5 0x000000000061cace in ExecTargetList (tupdesc=, isDone=0x0, itemIsDone=0x147a1d6cffa0, isnull=0x147a1d6ce090 "", values=0x147a1d6ce000, econtext=0x147a1d6cd530, targetlist=0x147cbb3ecc28) at execQual.c:5423
6 ExecProject (projInfo=, isDone=isDone@entry=0x0) at execQual.c:5647
7 0x000000000062f10b in ExecOnConflictUpdate (returning=, canSetTag=1 '\001', estate=0x147a1d6c8038, excludedSlot=0x147a1d6c9bc0, planSlot=0x147a1d6c9bc0, conflictTid=0x7ffc69854520, resultRelInfo=0x147a1d6c81c8, mtstate=0x147a1d6c82d8) at nodeModifyTable.c:1234
8 ExecInsert (canSetTag=1 '\001', estate=0x147a1d6c8038, onconflict=ONCONFLICT_UPDATE, arbiterIndexes=0x147a1654fd28, planSlot=0x147a1d6c9bc0, slot=0x147a1d6c9bc0, mtstate=0x147a1d6c82d8) at nodeModifyTable.c:410
9 ExecModifyTable (node=node@entry=0x147a1d6c82d8) at nodeModifyTable.c:1512
10 0x00000000006162a8 in ExecProcNode (node=node@entry=0x147a1d6c82d8) at execProcnode.c:396
11 0x0000000000612727 in ExecutePlan (dest=0x147a1654d888, direction=, numberTuples=0, sendTuples=, operation=CMD_INSERT, use_parallel_mode=, planstate=0x147a1d6c82d8, estate=0x147a1d6c8038) at execMain.c:1567
12 standard_ExecutorRun (queryDesc=0x147cbb3c6438, direction=, count=0) at execMain.c:339