facebookincubator / velox

A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
https://velox-lib.io/
Apache License 2.0
3.38k stars 1.11k forks source link

Sort merge join failed with state.data == nullptr exception #10466

Open JkSelf opened 1 month ago

JkSelf commented 1 month ago

Bug description

When we enable smj join in 2TB Q4 TPC-DS using Gluten, we encounter the following exception.

Error Source: RUNTIME
Error Code: INVALID_STATE
Retriable: False
Expression: state.data == nullptr
Context: Operator: CallbackSink[N/A] 2
Function: operator()
File: /mnt/DP_disk3/jk/projects/gluten/ep/build-velox/build/velox_ep/velox/exec/MergeSource.cpp
Line: 278
Stack trace:
# 0  facebook::velox::VeloxException::VeloxException(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)
# 1  void facebook::velox::detail::veloxCheckFail<facebook::velox::VeloxRuntimeError, facebook::velox::detail::CompileTimeEmptyString>(facebook::velox::detail::VeloxCheckFailArgs const&, facebook::velox::detail::CompileTimeEmptyString)
# 2  facebook::velox::exec::MergeJoinSource::enqueue(std::shared_ptr<facebook::velox::RowVector>, folly::SemiFuture<folly::Unit>*)
# 3  std::_Function_handler<facebook::velox::exec::BlockingReason (std::shared_ptr<facebook::velox::RowVector>, folly::SemiFuture<folly::Unit>*), facebook::velox::exec::detail::makeConsumerSupplier(std::shared_ptr<facebook::velox::core::PlanNode const> const&)::{lambda(int, facebook::velox::exec::DriverCtx*)#5}::operator()(int, facebook::velox::exec::DriverCtx*) const::{lambda(std::shared_ptr<facebook::velox::RowVector>, folly::SemiFuture<folly::Unit>*)#1}>::_M_invoke(std::_Any_data const&, std::shared_ptr<facebook::velox::RowVector>&&, folly::SemiFuture<folly::Unit>*&&)
# 4  facebook::velox::exec::CallbackSink::addInput(std::shared_ptr<facebook::velox::RowVector>)
# 5  facebook::velox::exec::Driver::runInternal(std::shared_ptr<facebook::velox::exec::Driver>&, std::shared_ptr<facebook::velox::exec::BlockingState>&, std::shared_ptr<facebook::velox::RowVector>&)
# 6  facebook::velox::exec::Driver::next(std::shared_ptr<facebook::velox::exec::BlockingState>&)
# 7  facebook::velox::exec::Task::next(folly::SemiFuture<folly::Unit>*)
# 8  gluten::WholeStageResultIterator::next()
# 9  Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeHasNext
# 10 0x00007f10d6574427
# 11 0x00007f10d7611fb7

System information

Velox System Info v0.0.2 Commit: 245606e74111a75172c1ff55822dbaf67f5d8f42 CMake Version: 3.28.3 System: Linux-5.4.0-167-generic Arch: x86_64 C++ Compiler: /usr/bin/c++ C++ Compiler Version: 9.4.0 C Compiler: /usr/bin/cc C Compiler Version: 9.4.0 CMake Prefix Path: /usr/local;/usr;/;/usr/local/lib/python3.8/dist-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

No response

JkSelf commented 1 month ago

@pedroerp Do you have any input? Thanks.

pedroerp commented 1 month ago

Hi @JkSelf I haven't seen this error before on either our internal usage or join fuzzer runs. Is there a way to reproduce it?

JkSelf commented 1 month ago

Hi @JkSelf I haven't seen this error before on either our internal usage or join fuzzer runs. Is there a way to reproduce it?

@pedroerp Fixed in https://github.com/facebookincubator/velox/pull/10509. Can you help to review?