facebookincubator / velox

A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
https://velox-lib.io/
Apache License 2.0
3.42k stars 1.12k forks source link

Check failure happens in SsdFile.h #10098

Open yma11 opened 3 months ago

yma11 commented 3 months ago

Bug description

When use AsyncDataCache together with SSDCache enabled, I got an error about size check failure on file_entry_size vs 8M. Here is the whole stack:

E0606 22:20:34.059942 1534672 Exceptions.h:67] Line: /root/workspace/gluten-rebase/ep/build-velox/build/velox_ep/./velox/common/caching/SsdFile.h:42, Function:SsdRun, Expression: size <= 1 << kSizeBits (13269190 vs. 8388608), Source: RUNTIME, ErrorCode: INVALID_STATE
W0606 22:20:34.061712 1534672 SsdCache.cpp:134] [SSDCA] Ignoring error in SsdFile::write: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: (13269190 vs. 8388608)
Retriable: False
Expression: size <= 1 << kSizeBits
Function: SsdRun
File: /root/workspace/gluten-rebase/ep/build-velox/build/velox_ep/./velox/common/caching/SsdFile.h
Line: 42
Stack trace:
# 0  std::shared_ptr<facebook::velox::VeloxException::State const> facebook::velox::VeloxException::State::make<facebook::velox::VeloxException::make(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)::{lambda(auto:1&)#1}>(facebook::velox::VeloxException::Type, facebook::velox::VeloxException::make(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)::{lambda(auto:1&)#1})
# 1  facebook::velox::VeloxException::VeloxException(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)
# 2  facebook::velox::VeloxRuntimeError::VeloxRuntimeError(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, std::basic_string_view<char, std::char_traits<char> >)
# 3  void facebook::velox::detail::veloxCheckFail<facebook::velox::VeloxRuntimeError, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(facebook::velox::detail::VeloxCheckFailArgs const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
# 4  facebook::velox::cache::SsdRun::SsdRun(unsigned long, unsigned int, unsigned int)
# 5  facebook::velox::cache::SsdFile::write(std::vector<facebook::velox::cache::CachePin, std::allocator<facebook::velox::cache::CachePin> >&)

System information

Velox System Info v0.0.2 Commit: 6ea98b611d27c081cf07291c2f9b05fdca332e24 CMake Version: 3.28.3 System: Linux-5.4.0-156-generic Arch: x86_64 C++ Compiler: /usr/bin/c++ C++ Compiler Version: 9.4.0 C Compiler: /usr/bin/cc C Compiler Version: 9.4.0 CMake Prefix Path: /usr/local;/usr;/;/usr/local/lib/python3.8/dist-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

No response

yma11 commented 3 months ago

@oerling Do you have any idea about this? I think a file cache entry is easy to be larger than 8M, why we limit to this size here?

yma11 commented 3 months ago

@xiaoxmeng @zacw7 Do you guys happen to know about this? Thanks.

zacw7 commented 3 months ago

SsdRun only reserves 23 bits (out of 64 bits) for size. Maybe we can expand it to 128 bits.

xiaoxmeng commented 3 months ago

velox/common/caching/SsdFile.h

@yma11 what's the loadQuantum size used in the query? Thanks!

yma11 commented 3 months ago

It's 256MB. So I need to set it 8M if want to enable SSD cache?

xiaoxmeng commented 3 months ago

It's 256MB. So I need to set it 8M if want to enable SSD cache?

@yma11 that's the current implementation limitation which need to fix @zacw7. We shall also put limitation on the max size of loadQuantum that we support.