facebookincubator / velox

A composable and fully extensible C++ execution engine library for data management systems.
https://velox-lib.io/
Apache License 2.0
3.52k stars 1.15k forks source link

"MEM_CAP_EXCEEDED" on HashBuild although spill is already turned on #4349

Open zhztheplayer opened 1 year ago

zhztheplayer commented 1 year ago

Bug description

I got following error although I already managed to turn on spill on HashBuild:

Failed Operator: HashBuild.1: 655.38MB
Retriable: True
Function: incrementReservation
File: ../../velox/common/memory/MemoryUsageTracker.cpp
Line: 153
Stack trace:
# 0  std::shared_ptr<facebook::velox::VeloxException::State const> facebook::velox::VeloxException::State::make<facebook::velox::VeloxException::make(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)::{lambda(auto:1&)#1}>(facebook::velox::VeloxException::Type, facebook::velox::VeloxException::make(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)::{lambda(auto:1&)#1})
# 1  facebook::velox::VeloxException::VeloxException(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)
# 2  facebook::velox::VeloxRuntimeError::VeloxRuntimeError(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, std::basic_string_view<char, std::char_traits<char> >)
# 3  void facebook::velox::detail::veloxCheckFail<facebook::velox::VeloxRuntimeError, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(facebook::velox::detail::VeloxCheckFailArgs const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
# 4  facebook::velox::memory::MemoryUsageTracker::incrementReservation(unsigned long)
# 5  facebook::velox::memory::MemoryUsageTracker::reserve(unsigned long, bool)
# 6  facebook::velox::memory::MemoryUsageTracker::update(long)
# 7  gluten::WrappedVeloxMemoryPool::allocateContiguous(unsigned long, facebook::velox::memory::ContiguousAllocation&)::{lambda(long, bool)#1}::operator()(long, bool) const
# 8  std::_Function_handler<void (long, bool), gluten::WrappedVeloxMemoryPool::allocateContiguous(unsigned long, facebook::velox::memory::ContiguousAllocation&)::{lambda(long, bool)#1}>::_M_invoke(std::_Any_data const&, long&&, bool&&)
# 9  std::function<void (long, bool)>::operator()(long, bool) const
# 10 facebook::velox::memory::(anonymous namespace)::MallocAllocator::allocateContiguousImpl(unsigned long, facebook::velox::memory::Allocation*, facebook::velox::memory::ContiguousAllocation&, std::function<void (long, bool)>)
# 11 facebook::velox::memory::(anonymous namespace)::MallocAllocator::allocateContiguous(unsigned long, facebook::velox::memory::Allocation*, facebook::velox::memory::ContiguousAllocation&, std::function<void (long, bool)>)::{lambda()#1}::operator()() const
# 12 void facebook::velox::memory::Stats::recordAllocate<facebook::velox::memory::(anonymous namespace)::MallocAllocator::allocateContiguous(unsigned long, facebook::velox::memory::Allocation*, facebook::velox::memory::ContiguousAllocation&, std::function<void (long, bool)>)::{lambda()#1}>(long, int, facebook::velox::memory::(anonymous namespace)::MallocAllocator::allocateContiguous(unsigned long, facebook::velox::memory::Allocation*, facebook::velox::memory::ContiguousAllocation&, std::function<void (long, bool)>)::{lambda()#1})
# 13 facebook::velox::memory::(anonymous namespace)::MallocAllocator::allocateContiguous(unsigned long, facebook::velox::memory::Allocation*, facebook::velox::memory::ContiguousAllocation&, std::function<void (long, bool)>)
# 14 gluten::WrappedVeloxMemoryPool::allocateContiguous(unsigned long, facebook::velox::memory::ContiguousAllocation&)
# 15 facebook::velox::exec::HashTable<true>::allocateTables(unsigned long)
# 16 facebook::velox::exec::HashTable<true>::checkSize(int)
# 17 facebook::velox::exec::HashTable<true>::setHashMode(facebook::velox::exec::BaseHashTable::HashMode, int)
# 18 facebook::velox::exec::HashTable<true>::decideHashMode(int)
# 19 facebook::velox::exec::HashTable<true>::prepareJoinTable(std::vector<std::unique_ptr<facebook::velox::exec::BaseHashTable, std::default_delete<facebook::velox::exec::BaseHashTable> >, std::allocator<std::unique_ptr<facebook::velox::exec::BaseHashTable, std::default_delete<facebook::velox::exec::BaseHashTable> > > >, folly::Executor*)
# 20 facebook::velox::exec::HashBuild::finishHashBuild()
# 21 facebook::velox::exec::HashBuild::noMoreInputInternal()
# 22 facebook::velox::exec::HashBuild::noMoreInput()
# 23 facebook::velox::exec::Driver::runInternal(std::shared_ptr<facebook::velox::exec::Driver>&, std::shared_ptr<facebook::velox::exec::BlockingState>&, std::shared_ptr<facebook::velox::RowVector>&)
# 24 facebook::velox::exec::Driver::next(std::shared_ptr<facebook::velox::exec::BlockingState>&)
# 25 facebook::velox::exec::Task::next()
...

I use single-thread execution for Velox's task.

After going through Velox Join's code I didn't see any error-handling code around hash table creation during HashBuild::finishHashBuild():

https://github.com/facebookincubator/velox/blob/cde03469ec0f83670321a6fa68575496fb2f32bf/velox/exec/HashTable.cpp#L621

Is this intended?

System information

Velox System Info v0.0.2 Commit: 0a92517a9abb6ff99a0eeabfd647ff31de14a606 CMake Version: 3.16.3 System: Linux-6.1.13-100.fc36.x86_64 Arch: x86_64 C++ Compiler: /usr/bin/c++ C++ Compiler Version: 9.4.0 C Compiler: /usr/bin/cc C Compiler Version: 9.4.0 CMake Prefix Path: /usr/local;/usr;/;/usr;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

No response

zhztheplayer commented 1 year ago

@xiaoxmeng Would you like to help me take a look? Thanks a lot.