Open minhancao opened 5 days ago
Ran TPCDS SF10k with 8 Velox workers 128 gb memory each with AsyncDataCache enabled and disabled CoW, failed on query 6 due to https://github.com/facebookincubator/velox/blob/059337fca8170c2b361ea9d89d6c2cdd9e157c4a/velox/exec/HashProbe.cpp#L1142
Worker 3:
E20241122 21:26:29.133782 790 Exceptions.h:66] Line: /prestissimo/velox/velox/exec/HashProbe.cpp:1085, Function:getOutputInternal, Expression: numOut <= outputBatchSize (52 vs. 3), Source: RUNTIME, ErrorCode: INVALID_STATE
Worker 6:
E20241122 21:26:29.129307 195 Exceptions.h:66] Line: /prestissimo/velox/velox/exec/HashProbe.cpp:1085, Function:getOutputInternal, Expression: numOut <= outputBatchSize (69 vs. 3), Source: RUNTIME, ErrorCode: INVALID_STATE E20241122 21:26:29.129565 782 Exceptions.h:66] Line: /prestissimo/velox/velox/exec/HashProbe.cpp:1085, Function:getOutputInternal, Expression: numOut <= outputBatchSize (13 vs. 3), Source: RUNTIME, ErrorCode: INVALID_STATE
Worker 7:
E20241122 21:26:29.128383 769 Exceptions.h:66] Line: /prestissimo/velox/velox/exec/HashProbe.cpp:1085, Function:getOutputInternal, Expression: numOut <= outputBatchSize (117 vs. 3), Source: RUNTIME, ErrorCode: INVALID_STATE
Worker 8:
E20241122 21:26:29.135933 772 Exceptions.h:66] Line: /prestissimo/velox/velox/exec/HashProbe.cpp:1085, Function:getOutputInternal, Expression: numOut <= outputBatchSize (64 vs. 3), Source: RUNTIME, ErrorCode: INVALID_STATE
Worker 3's stack dump trace:
VeloxRuntimeError: numOut <= outputBatchSize (52 vs. 3) Operator: HashProbe[1922] 1 at Unknown.# 0 _ZN8facebook5velox7process10StackTraceC1Ei(Unknown Source) at Unknown.# 1 _ZN8facebook5velox14VeloxExceptionC2EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_(Unknown Source) at Unknown.# 2 _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEEEvRKNS1_18VeloxCheckFailArgsET0_(Unknown Source) at Unknown.# 3 _ZN8facebook5velox4exec9HashProbe17getOutputInternalEb(Unknown Source) at Unknown.# 4 _ZN8facebook5velox4exec9HashProbe9getOutputEv(Unknown Source) at Unknown.# 5 _ZZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEEENKUlvE3_clEv(Unknown Source) at Unknown.# 6 _ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE(Unknown Source) at Unknown.# 7 _ZN8facebook5velox4exec6Driver3runESt10shared_ptrIS2_E(Unknown Source) at Unknown.# 8 _ZN5folly6detail8function5call_IZN8facebook5velox4exec6Driver7enqueueESt10shared_ptrIS6_EEUlvE_Lb1ELb0EvJEEET2_DpT3_RNS1_4DataE(Unknown Source) at Unknown.# 9 _ZN5folly6detail8function14FunctionTraitsIFvvEEclEv(Unknown Source) at Unknown.# 10 _ZN5folly18ThreadPoolExecutor7runTaskERKSt10shared_ptrINS0_6ThreadEEONS0_4TaskE(Unknown Source) at Unknown.# 11 _ZN5folly21CPUThreadPoolExecutor9threadRunESt10shared_ptrINS_18ThreadPoolExecutor6ThreadEE(Unknown Source) at Unknown.# 12 _ZSt13__invoke_implIvRMN5folly18ThreadPoolExecutorEFvSt10shared_ptrINS1_6ThreadEEERPS1_JRS4_EET_St21__invoke_memfun_derefOT0_OT1_DpOT2_(Unknown Source) at Unknown.# 13 _ZSt8__invokeIRMN5folly18ThreadPoolExecutorEFvSt10shared_ptrINS1_6ThreadEEEJRPS1_RS4_EENSt15__invoke_resultIT_JDpT0_EE4typeEOSC_DpOSD_(Unknown Source) at Unknown.# 14 _ZNSt5_BindIFMN5folly18ThreadPoolExecutorEFvSt10shared_ptrINS1_6ThreadEEEPS1_S4_EE6__callIvJEJLm0ELm1EEEET_OSt5tupleIJDpT0_EESt12_Index_tupleIJXspT1_EEE(Unknown Source) at Unknown.# 15 _ZNSt5_BindIFMN5folly18ThreadPoolExecutorEFvSt10shared_ptrINS1_6ThreadEEEPS1_S4_EEclIJEvEET0_DpOT_(Unknown Source) at Unknown.# 16 _ZN5folly6detail8function5call_ISt5_BindIFMNS_18ThreadPoolExecutorEFvSt10shared_ptrINS4_6ThreadEEEPS4_S7_EELb1ELb0EvJEEET2_DpT3_RNS1_4DataE(Unknown Source) at Unknown.# 17 0x00000000000dbad4(Unknown Source) at Unknown.# 18 start_thread(Unknown Source) at Unknown.# 19 clone(Unknown Source)
N/A
No response
Can you check if the issue is still there after #11659? CC: @zhli1142015
Bug description
Ran TPCDS SF10k with 8 Velox workers 128 gb memory each with AsyncDataCache enabled and disabled CoW, failed on query 6 due to https://github.com/facebookincubator/velox/blob/059337fca8170c2b361ea9d89d6c2cdd9e157c4a/velox/exec/HashProbe.cpp#L1142
Worker 3:
Worker 6:
Worker 7:
Worker 8:
Worker 3's stack dump trace:
System information
N/A
Relevant logs
No response