cmu-db / peloton

The Self-Driving Database Management System
http://pelotondb.io
Apache License 2.0
2.03k stars 623 forks source link

Segfault in value_integrity_test when compiling with Clang on macOS #1334

Closed mbutrovich closed 6 years ago

mbutrovich commented 6 years ago

As of #1302 the master branch generates a segfault in value_integrity_test when compiling with Clang on macOS. When run with AddressSanitizer enabled, it implies that a bad memory access is generated within libunwind afte a DivideByZeroException is thrown.

I haven't verified if this is unique to Apple's custom Clang release or manifests on normal Clang as well. My clang version is:

Apple LLVM version 9.1.0 (clang-902.0.39.1)
Target: x86_64-apple-darwin17.5.0
Thread model: posix

Details on the segfault from AddressSanitizer are below:

ASAN:DEADLYSIGNAL
=================================================================
==11616==ERROR: AddressSanitizer: SEGV on unknown address 0x00013ae1b090 (pc 0x7fff599aac1a bp 0x7ffee05b6bb0 sp 0x7ffee05b6b50 T0)
==11616==The signal is caused by a READ memory access.
    #0 0x7fff599aac19 in libunwind::CFI_Parser<libunwind::LocalAddressSpace>::decodeFDE(libunwind::LocalAddressSpace&, unsigned long long, libunwind::CFI_Parser<libunwind::LocalAddressSpace>::FDE_Info*, libunwind::CFI_Parser<libunwind::LocalAddressSpace>::CIE_Info*) (libunwind.dylib:x86_64+0x1c19)
    #1 0x7fff599a9c56 in libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_x86_64>::setInfoBasedOnIPRegister(bool) (libunwind.dylib:x86_64+0xc56)
    #2 0x7fff599aa24a in libunwind::UnwindCursor<libunwind::LocalAddressSpace, libunwind::Registers_x86_64>::step() (libunwind.dylib:x86_64+0x124a)
    #3 0x7fff599a9a0d in _Unwind_RaiseException (libunwind.dylib:x86_64+0xa0d)
    #4 0x7fff5763125d in __cxa_throw (libc++abi.dylib:x86_64+0x1d25d)
    #5 0x1100700e8 in peloton::codegen::RuntimeFunctions::ThrowDivideByZeroException() runtime_functions.cpp:125
    #6 0x13ae1a036  (<unknown module>)
    #7 0x10f65b208 in peloton::test::ValueIntegrityTest_IntegerDivideByZero_Test::TestBody() value_integrity_test.cpp:157
    #8 0x10f7b7fba in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) gmock-gtest-all.cc:3562
    #9 0x10f6db0c2 in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*) gmock-gtest-all.cc:3598
    #10 0x10f6daa5d in testing::Test::Run() gmock-gtest-all.cc:3634
    #11 0x10f6debfa in testing::TestInfo::Run() gmock-gtest-all.cc:3810
    #12 0x10f6e2c4a in testing::TestCase::Run() gmock-gtest-all.cc:3928
    #13 0x10f7066b8 in testing::internal::UnitTestImpl::RunAllTests() gmock-gtest-all.cc:5799
    #14 0x10f7c78aa in bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) gmock-gtest-all.cc:3562
    #15 0x10f70560c in bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*) gmock-gtest-all.cc:3598
    #16 0x10f7050ab in testing::UnitTest::Run() gmock-gtest-all.cc:5410
    #17 0x10f7f3990 in RUN_ALL_TESTS() gtest.h:20058
    #18 0x10f7f3904 in main gmock_main.cc:53
    #19 0x7fff59665014 in start (libdyld.dylib:x86_64+0x1014)

==11616==Register values:
rax = 0x000000013ae1b090  rbx = 0x000000013ae1a036  rcx = 0x00007ffee05b7368  rdx = 0x00007ffee05b73a0  
rdi = 0x00007fff91c9ba99  rsi = 0x000000013ae1b090  rbp = 0x00007ffee05b6bb0  rsp = 0x00007ffee05b6b50  
 r8 = 0xffffffff00004f00   r9 = 0x00007fff91c9b1e0  r10 = 0x0000500100004600  r11 = 0x0000500100000000  
r12 = 0x00007ffee05b7848  r13 = 0x00007ffee05b73a0  r14 = 0x00007ffee05b7368  r15 = 0x000000013ae1b090  
AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV (libunwind.dylib:x86_64+0x1c19) in libunwind::CFI_Parser<libunwind::LocalAddressSpace>::decodeFDE(libunwind::LocalAddressSpace&, unsigned long long, libunwind::CFI_Parser<libunwind::LocalAddressSpace>::FDE_Info*, libunwind::CFI_Parser<libunwind::LocalAddressSpace>::CIE_Info*)
==11616==ABORTING
mbutrovich commented 6 years ago

Looks like a similar issue to what was discussed in #995.

Also, if I disable the prior test (IntegerOverflow) and run the failing test (IntegerDivideByZero) in isolation, it passes.

poojanilangekar commented 6 years ago

I am not sure why commenting out the QueryCache::Instance().Clear() line from the destructor of PelotonCodeGenTest fixes the issue. The QueryCache is used to bootstrap the catalog instance but that doesn't seem to cause the issue. The segfault occurs while throwing theRuntimeFunctions::ThrowDivideByZeroException() exception.

poojanilangekar commented 6 years ago

Interestingly, it wasn't triggered in #1335. Here is the snippet of the log:

100% tests passed, 0 tests failed out of 177 
mbutrovich commented 6 years ago

My suspicion is this is manifesting a bug inside of LLVM's internal use of libunwind (they use it for their C++ ABI for exception handling), since Peloton never uses the library explicitly. A similar bug has been reported recently to LLVM, and it visible all the way back to LLVM 3.7:

https://www.mail-archive.com/llvm-bugs@lists.llvm.org/msg20647.html

Even with AddressSanitizer enabled, we wouldn't catch this if it was built with gcc since gcc doesn't use libunwind for its exception implementation.

poojanilangekar commented 6 years ago

But #1335 used clang to build on macOS. I don't think you can build peloton with gcc on macOS.

tcm-marcel commented 6 years ago

If it is really the bug @mbutrovich mentioned, we can't do much about it. We can try to change the order or the executed test cases so that the bug is not triggered... 😬

mbutrovich commented 6 years ago

I tried that and it just causes the (now second) IntegerOverflow test to generate the segfault. :(

mbutrovich commented 6 years ago

Current master branch no longer manifests this issue, and I don't believe this is a bug in our control anyway.