Zygo / bees

Best-Effort Extent-Same, a btrfs dedupe agent
GNU General Public License v3.0
661 stars 55 forks source link

Test failure when building with gcc 11.2 -flto #210

Open kepstin opened 2 years ago

kepstin commented 2 years ago

While doing some testing in preparation for packaging bees, I did a build using make CXXFLAGS=-flto CFLAGS=-flto to verify that the build would work correctly with gcc LTO.

Using GCC version 11.2.0.

I'm seeing the following test failure in fd:

Testing test_derived_cast()...expect bad cast exception:                                                                                                                  Program received signal SIGABRT, Aborted.

Here's a backtrace from the error:

#0  0x00007ffff7b3a5bc in __pthread_kill_implementation () from /usr/x86_64-pc-linux-gnu/lib/libc.so.6
#1  0x00007ffff7aedec6 in raise () from /usr/x86_64-pc-linux-gnu/lib/libc.so.6
#2  0x00007ffff7ad87b1 in abort () from /usr/x86_64-pc-linux-gnu/lib/libc.so.6
#3  0x00005555555597a3 in _Unwind_SetGR (context=<optimized out>, index=<optimized out>, val=<optimized out>)
    at /var/tmp/paludis/build/sys-devel-gcc-11.2.0/work/gcc-11.2.0/libgcc/unwind-dw2.c:282
#4  0x000055555556ab1b in __gcc_personality_v0 (version=<optimized out>, actions=<optimized out>, exception_class=<optimized out>, ue_header=0x555555586570, 
    context=0x7fffffffe3c0) at /var/tmp/paludis/build/sys-devel-gcc-11.2.0/work/gcc-11.2.0/libgcc/unwind-c.c:231
#5  0x00007ffff7f25e7c in _Unwind_RaiseException_Phase2 (exc=0x555555586570, context=0x7fffffffe3c0, frames_p=0x7fffffffe4b0)
    at /var/tmp/paludis/build/sys-devel-gcc-11.2.0/work/gcc-11.2.0/libgcc/unwind.inc:64
#6  0x00007ffff7f2667f in _Unwind_RaiseException (exc=0x555555586570) at /var/tmp/paludis/build/sys-devel-gcc-11.2.0/work/gcc-11.2.0/libgcc/unwind.inc:136
#7  0x00007ffff7e3edbb in __cxa_throw () from /usr/x86_64-pc-linux-gnu/lib/libstdc++.so.6
#8  0x000055555555aa5a in std::shared_ptr<DerivedFdResource> cast<DerivedFdResource>(crucible::Fd const&) ()
#9  0x000055555555aa86 in std::enable_if<std::__and_<std::is_void<void>, std::__is_invocable<test_derived_cast()::{lambda()#1}&> >::value, void>::type std::__invoke_r<void, test_derived_cast()::{lambda()#1}&>(test_derived_cast()::{lambda()#1}&) ()
#10 0x000055555555ab2b in std::_Function_handler<void (), test_derived_cast()::lambda()#1}>::_M_invoke(std::_Any_data const&) ()
#11 0x000055555555ea80 in crucible::catch_all(std::function<void ()> const&, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)> const&) ()
#12 0x00005555555624cb in test_derived_cast() ()
#13 0x0000555555564721 in main ()

I… don't know enough about C++ to know exactly what this means, but there's apparently some change in how exception handling works with gcc LTO in this compiler version. From other folks testing on IRC it seems like this might be specific to the gcc version I'm using.

Zygo commented 2 years ago

I hate to be the one that says "it looks like a toolchain bug" but it looks like a toolchain bug? dynamic_cast is supposed to provide a safe way to either cast a pointer from one type to another, or return NULL. Crashing is neither of those two options.

Zygo commented 2 years ago

bees doesn't actually use derived types of Fd, so the main bees binary should still work.

kakra commented 2 years ago

LTO with gcc 11+ is really problematic... If you compile the toolchain itself with lto, it will make libstdc++ completely unusable because most stdlib symbols are missing for linking.

Zygo commented 2 years ago

I've tried g++-11 (Debian 11.2.0-12) 11.2.0 and g++ (GCC) 11.2.1 20211203 (Red Hat 11.2.1-7) with no problems so far. Which distro is having problems?

kepstin commented 2 years ago

The particular distro I'm seeing this issue on is Exherbo which is admittedly a pretty niche distribution. FWIW, the core toolchain stuff (GCC itself) is built without LTO.

At the moment I'm inclined to agree that this is a toolchain bug, indeed.

kakra commented 2 years ago

Which distro is having problems?

Gentoo here... The problem is compiling gcc itself with -flto - it breaks the symbol visibility. Some other libraries may be affected, too. I've only encounted C++ stdlib so far. This is a known issue. Currently, I stopped using -flto for essential core libraries like gcc (which provides libstdc++), glibc and a few others.

Actually, Gentoo is aware of the issue but they lately added a feature compiling gcc with custom cflags, and that accidentally broke my system, and it was really difficult to escape that problem. As an example, cmake would no longer run, and it won't compile, and many packages depend on it. boost was broken, too. I probably would also have encountered bees as a problem but that just isn't an essential package so I didn't notice it.

Compiling just leaf packages of the dependency tree with LTO is usually no problem at all because it doesn't export symbols used by other libraries.

Using -fvisibility-inlines-hidden causes the same problem currently when compiling gcc-11 itself (read: not compile WITH gcc-11, but bootstrap gcc itself).

So in the end the problem here may be that other software has been compiled with LTO, and thus bees won't compile, probably no matter if with or without LTO. Compiling gcc itself with LTO seems completely broken, and compiling some libraries with gcc-11 and LTO seems to be broken, too.