facebookincubator / velox

A composable and fully extensible C++ execution engine library for data management systems.
https://velox-lib.io/
Apache License 2.0
3.52k stars 1.15k forks source link

Fuzzer hash join test with spilling return incorrect result #3567

Open xiaoxmeng opened 1 year ago

xiaoxmeng commented 1 year ago

Description

Fuzzer hash join test with spilling return incorrect result (see error log)

Error Reproduction

--logtostderr --duration_sec 60 --batch_size=5 --seed 2199528297 Note: simply hard code the seed to 3488155619 will reproduce the failure on the first iteration

Relevant logs

I20221220 23:25:08.987138 812602 JoinFuzzer.cpp:680] ==============================> Started iteration 0 (seed: 3488155619)
I20221220 23:25:09.057631 812602 JoinFuzzer.cpp:261] Executing query plan: 
-- HashJoin[INNER t3=u3 AND t2=u2 AND t0=u0 AND t1=u1] -> tp4:MAP<VARCHAR,INTEGER>, u2:INTEGER
  -- Values[25 rows in 5 vectors] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
  -- Values[5 rows in 5 vectors] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
I20221220 23:25:09.064841 812602 JoinFuzzer.cpp:275] Results: [ROW ROW<tp4:MAP<VARCHAR,INTEGER>,u2:INTEGER>: 4 elements, no nulls]
I20221220 23:25:09.154328 812602 JoinFuzzer.cpp:647] Testing plan #0
I20221220 23:25:09.154364 812602 JoinFuzzer.cpp:261] Executing query plan: 
-- HashJoin[INNER t3=u3 AND t2=u2 AND t0=u0 AND t1=u1] -> tp4:MAP<VARCHAR,INTEGER>, u2:INTEGER
  -- Values[25 rows in 5 vectors] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
  -- Values[5 rows in 5 vectors] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
I20221220 23:25:09.157127 812602 JoinFuzzer.cpp:275] Results: [ROW ROW<tp4:MAP<VARCHAR,INTEGER>,u2:INTEGER>: 4 elements, no nulls]
I20221220 23:25:09.157409 812602 JoinFuzzer.cpp:662] Testing plan #0 with spilling
I20221220 23:25:09.157418 812602 JoinFuzzer.cpp:261] Executing query plan: 
-- HashJoin[INNER t3=u3 AND t2=u2 AND t0=u0 AND t1=u1] -> tp4:MAP<VARCHAR,INTEGER>, u2:INTEGER
  -- Values[25 rows in 5 vectors] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
  -- Values[5 rows in 5 vectors] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
I20221220 23:25:09.319696 812602 JoinFuzzer.cpp:275] Results: [ROW ROW<tp4:MAP<VARCHAR,INTEGER>,u2:INTEGER>: 4 elements, no nulls]
I20221220 23:25:09.395527 812602 JoinFuzzer.cpp:647] Testing plan #1
I20221220 23:25:09.395561 812602 JoinFuzzer.cpp:261] Executing query plan: 
-- HashJoin[INNER u3=t3 AND u2=t2 AND u0=t0 AND u1=t1] -> tp4:MAP<VARCHAR,INTEGER>, u2:INTEGER
  -- Values[5 rows in 5 vectors] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
  -- Values[25 rows in 5 vectors] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
I20221220 23:25:09.398710 812602 JoinFuzzer.cpp:275] Results: [ROW ROW<tp4:MAP<VARCHAR,INTEGER>,u2:INTEGER>: 4 elements, no nulls]
I20221220 23:25:09.399605 812602 JoinFuzzer.cpp:662] Testing plan #1 with spilling
I20221220 23:25:09.399639 812602 JoinFuzzer.cpp:261] Executing query plan: 
-- HashJoin[INNER u3=t3 AND u2=t2 AND u0=t0 AND u1=t1] -> tp4:MAP<VARCHAR,INTEGER>, u2:INTEGER
  -- Values[5 rows in 5 vectors] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
  -- Values[25 rows in 5 vectors] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
I20221220 23:25:09.821431 812602 JoinFuzzer.cpp:275] Results: [ROW ROW<tp4:MAP<VARCHAR,INTEGER>,u2:INTEGER>: 4 elements, no nulls]
I20221220 23:25:09.822482 812602 JoinFuzzer.cpp:647] Testing plan #2
I20221220 23:25:09.822499 812602 JoinFuzzer.cpp:261] Executing query plan: 
-- HashJoin[INNER t3=u3 AND t2=u2 AND t0=u0 AND t1=u1] -> tp4:MAP<VARCHAR,INTEGER>, u2:INTEGER
  -- LocalPartition[REPARTITION] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
    -- Values[10 rows in 2 vectors] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
    -- Values[5 rows in 1 vectors] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
    -- Values[5 rows in 1 vectors] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
    -- Values[5 rows in 1 vectors] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
  -- LocalPartition[REPARTITION] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
    -- Values[2 rows in 2 vectors] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
    -- Values[1 rows in 1 vectors] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
    -- Values[1 rows in 1 vectors] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
    -- Values[1 rows in 1 vectors] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
I20221220 23:25:09.834632 812602 JoinFuzzer.cpp:275] Results: [ROW ROW<tp4:MAP<VARCHAR,INTEGER>,u2:INTEGER>: 4 elements, no nulls]
I20221220 23:25:09.834925 812602 JoinFuzzer.cpp:662] Testing plan #2 with spilling
I20221220 23:25:09.834936 812602 JoinFuzzer.cpp:261] Executing query plan: 
-- HashJoin[INNER t3=u3 AND t2=u2 AND t0=u0 AND t1=u1] -> tp4:MAP<VARCHAR,INTEGER>, u2:INTEGER
  -- LocalPartition[REPARTITION] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
    -- Values[10 rows in 2 vectors] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
    -- Values[5 rows in 1 vectors] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
    -- Values[5 rows in 1 vectors] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
    -- Values[5 rows in 1 vectors] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
  -- LocalPartition[REPARTITION] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
    -- Values[2 rows in 2 vectors] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
    -- Values[1 rows in 1 vectors] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
    -- Values[1 rows in 1 vectors] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
    -- Values[1 rows in 1 vectors] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
I20221220 23:25:09.863643 812602 JoinFuzzer.cpp:275] Results: [ROW ROW<tp4:MAP<VARCHAR,INTEGER>,u2:INTEGER>: 4 elements, no nulls]
I20221220 23:25:09.871482 812602 JoinFuzzer.cpp:647] Testing plan #3
I20221220 23:25:09.871513 812602 JoinFuzzer.cpp:261] Executing query plan: 
-- MergeJoin[INNER t3=u3 AND t2=u2 AND t0=u0 AND t1=u1] -> tp4:MAP<VARCHAR,INTEGER>, u2:INTEGER
  -- OrderBy[t3 ASC NULLS LAST, t2 ASC NULLS LAST, t0 ASC NULLS LAST, t1 ASC NULLS LAST] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
    -- Values[25 rows in 5 vectors] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
  -- OrderBy[u3 ASC NULLS LAST, u2 ASC NULLS LAST, u0 ASC NULLS LAST, u1 ASC NULLS LAST] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
    -- Values[5 rows in 5 vectors] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
I20221220 23:25:09.873260 812602 JoinFuzzer.cpp:275] Results: [ROW ROW<tp4:MAP<VARCHAR,INTEGER>,u2:INTEGER>: 4 elements, no nulls]
I20221220 23:25:09.873533 812602 JoinFuzzer.cpp:662] Testing plan #3 with spilling
I20221220 23:25:09.873544 812602 JoinFuzzer.cpp:261] Executing query plan: 
-- MergeJoin[INNER t3=u3 AND t2=u2 AND t0=u0 AND t1=u1] -> tp4:MAP<VARCHAR,INTEGER>, u2:INTEGER
  -- OrderBy[t3 ASC NULLS LAST, t2 ASC NULLS LAST, t0 ASC NULLS LAST, t1 ASC NULLS LAST] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
    -- Values[25 rows in 5 vectors] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
  -- OrderBy[u3 ASC NULLS LAST, u2 ASC NULLS LAST, u0 ASC NULLS LAST, u1 ASC NULLS LAST] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
    -- Values[5 rows in 5 vectors] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
I20221220 23:25:09.896203 812602 JoinFuzzer.cpp:275] Results: [ROW ROW<tp4:MAP<VARCHAR,INTEGER>,u2:INTEGER>: 4 elements, no nulls]
I20221220 23:25:09.897856 812602 JoinFuzzer.cpp:647] Testing plan #4
I20221220 23:25:09.897873 812602 JoinFuzzer.cpp:261] Executing query plan: 
-- HashJoin[INNER u3=t3 AND u2=t2 AND u0=t0 AND u1=t1] -> tp4:MAP<VARCHAR,INTEGER>, u2:INTEGER
  -- Values[5 rows in 5 vectors] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
  -- Values[25 rows in 5 vectors] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
I20221220 23:25:09.899226 812602 JoinFuzzer.cpp:275] Results: [ROW ROW<tp4:MAP<VARCHAR,INTEGER>,u2:INTEGER>: 4 elements, no nulls]
I20221220 23:25:09.899497 812602 JoinFuzzer.cpp:662] Testing plan #4 with spilling
I20221220 23:25:09.899507 812602 JoinFuzzer.cpp:261] Executing query plan: 
-- HashJoin[INNER u3=t3 AND u2=t2 AND u0=t0 AND u1=t1] -> tp4:MAP<VARCHAR,INTEGER>, u2:INTEGER
  -- Values[5 rows in 5 vectors] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
  -- Values[25 rows in 5 vectors] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
I20221220 23:25:09.953143 812602 JoinFuzzer.cpp:275] Results: [ROW ROW<tp4:MAP<VARCHAR,INTEGER>,u2:INTEGER>: 4 elements, no nulls]
I20221220 23:25:09.954243 812602 JoinFuzzer.cpp:647] Testing plan #5
I20221220 23:25:09.954258 812602 JoinFuzzer.cpp:261] Executing query plan: 
-- HashJoin[INNER t3=u3 AND t2=u2 AND t0=u0 AND t1=u1] -> tp4:MAP<VARCHAR,INTEGER>, u2:INTEGER
  -- LocalPartition[REPARTITION] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
    -- Values[10 rows in 2 vectors] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
    -- Values[5 rows in 1 vectors] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
    -- Values[5 rows in 1 vectors] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
    -- Values[5 rows in 1 vectors] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
  -- LocalPartition[REPARTITION] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
    -- Values[2 rows in 2 vectors] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
    -- Values[1 rows in 1 vectors] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
    -- Values[1 rows in 1 vectors] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
    -- Values[1 rows in 1 vectors] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
I20221220 23:25:10.970846 812602 JoinFuzzer.cpp:275] Results: [ROW ROW<tp4:MAP<VARCHAR,INTEGER>,u2:INTEGER>: 4 elements, no nulls]
I20221220 23:25:10.971160 812602 JoinFuzzer.cpp:662] Testing plan #5 with spilling
I20221220 23:25:10.971171 812602 JoinFuzzer.cpp:261] Executing query plan: 
-- HashJoin[INNER t3=u3 AND t2=u2 AND t0=u0 AND t1=u1] -> tp4:MAP<VARCHAR,INTEGER>, u2:INTEGER
  -- LocalPartition[REPARTITION] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
    -- Values[10 rows in 2 vectors] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
    -- Values[5 rows in 1 vectors] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
    -- Values[5 rows in 1 vectors] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
    -- Values[5 rows in 1 vectors] -> t0:BOOLEAN, t1:REAL, t2:INTEGER, t3:TINYINT, tp4:MAP<VARCHAR,INTEGER>
  -- LocalPartition[REPARTITION] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
    -- Values[2 rows in 2 vectors] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
    -- Values[1 rows in 1 vectors] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
    -- Values[1 rows in 1 vectors] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
    -- Values[1 rows in 1 vectors] -> u0:BOOLEAN, u1:REAL, u2:INTEGER, u3:TINYINT
I20221220 23:25:10.989209 812602 JoinFuzzer.cpp:275] Results: [ROW ROW<tp4:MAP<VARCHAR,INTEGER>,u2:INTEGER>: 3 elements, no nulls]
E20221220 23:25:10.997591 812602 Exceptions.h:68] Line: /Users/xiaoxmeng/code/presto_cpp/velox/velox/exec/tests/JoinFuzzer.cpp:666, Function:verify, Expression: assertEqualResults({expected}, {actual}) Logically equivalent plans produced different results, Source: RUNTIME, ErrorCode: INVALID_STATE
libc++abi: terminating with uncaught exception of type facebook::velox::VeloxRuntimeError: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Logically equivalent plans produced different results
Retriable: False
Expression: assertEqualResults({expected}, {actual})
Function: verify
File: /Users/xiaoxmeng/code/presto_cpp/velox/velox/exec/tests/JoinFuzzer.cpp
Line: 666
Stack trace:
# 0  
# 1  
# 2  
# 3  
# 4  
# 5  
# 6  
# 7  
# 8  
# 9  
# 10 
# 11 
# 12 

/Users/xiaoxmeng/code/presto_cpp/velox/velox/exec/tests/utils/QueryAssertions.cpp:992: Failure
Value of: false
  Actual: false
Expected: true
Expected 4, got 3
0 extra rows, 1 missing rows
0 of extra rows:

1 of missing rows:
    [{"key":"","value":1758880627},{"key":",M,w-Ob|zJiXA&`M/Qrrw}A/w{932.Op#MBU$t#eC","value":1059523943},{"key":"G|.","value":501154991},{"key":"S}E\".?(gO)4[]!$5oH$&aDvQ>?Qi&[QysWgA8H[5Cn\"w2fj:C}ZKVy]FLq;^:Wl6v_%h'/MQ0+=:7X[K]","value":null},{"key":"W'hf%s$D\\*nAx]jXBg,6!h$HP|{k(H/537N5#W{Q%QFtpl}~^~Ixrq#}/4s2h+R'`?\\yv\\zp&h_>Rd","value":158271667},{"key":"]SsP$v8","value":1950041730},{"key":"`cN8^HQtv*Qb^GNe-tbihu~_vA7na?RVt:T;\\y4if[W}\".ZZ6`<m~cvX@IP*T","value":269949409},{"key":"dd#3l_2~W/g5'S\\2om2uioDIYKCqxT6mE%S%9hjSV=,@ZvA~]]jvp4j<|+OP/:Ir.FVIL?R)h","value":942431387},{"key":"e.JgL[%;&=<V$Op>(No-nyGU#[O1o*j1U>AqgAz%4HxK?~`lU:#tlV?&['@","value":1283556221},{"key":"f1_Gl%to<!nRUo[F7R${aN52dCav=b","value":1596748766}] | 2048507218

Unexpected results
xiaoxmeng commented 1 year ago

cc @mbasmanova

xiaoxmeng commented 1 year ago

The problem is caused by parallel join build which doesn't count the freed rows when list partition rows: https://github.com/facebookincubator/velox/blob/main/velox/exec/RowContainer.cpp#L576. Will fix the parallel join with spilling enabled. Also consider to add a query config for such new feature which allows us to disable them in production. @mbasmanova @oerling

xiaoxmeng commented 1 year ago

Will fix this issue after we add memory compaction support for rows in RowContainer which triggers a compaction to remove all the freed row slots after spilling. It will avoid this issue in parallel join build.