celeritas-project / celeritas

Celeritas is a new Monte Carlo transport code designed to accelerate scientific discovery in high energy physics by improving detector simulation throughput and energy efficiency using GPUs.
https://celeritas-project.github.io/celeritas/
Other
64 stars 35 forks source link

Fix weak vtables causing dynamic_cast error in apple clang 16 #1436

Closed sethrj closed 1 month ago

sethrj commented 1 month ago

There's something wrong with the dynamic cast when using -O1 or higher on apple clang 16 (i.e. XCode 16 aka the one targeting Sequoia):

test/orange/orangeinp/Solid.test.cc:309: Failure
Value of: dynamic_cast<ConeShape const*>(shape.get())
  Actual: false
Expected: true
actual shape: celeritas::orangeinp::Shape<celeritas::orangeinp::Cone>

for code

    {
        auto shape = ConeSolid::or_shape(
            "cone", Cone{{1, 2}, 10.0}, std::nullopt, SolidEnclosedAngle{});
        EXPECT_TRUE(shape);
        EXPECT_TRUE(dynamic_cast<ConeShape const*>(shape.get()))
            << "actual shape: " << demangle_shape(*shape);
    }

Moving the dynamic-cast code from the test to the library works:

ConeShape const* dyncast_cone_shape(ObjectInterface const* obj)
{
    return dynamic_cast<ConeShape const*>(obj);
}

and generates identical assembly to the function if added to the test binary:

celeritas::orangeinp::test::dyncast_cone_shape_2(celeritas::orangeinp::ObjectInterface const*): ; @_ZN9celeritas9orangeinp4test20dyncast_cone_shape_2EPKNS0_15ObjectInterfaceE
Lfunc_begin77:
    cbz x0, LBB77_2
    ldr x8, [x0]
    adrp    x9, vtable for celeritas::orangeinp::Shape<celeritas::orangeinp::Cone>@GOTPAGE
    ldr x9, [x9, vtable for celeritas::orangeinp::Shape<celeritas::orangeinp::Cone>@GOTPAGEOFF]
    add x9, x9, #16
    cmp x8, x9
    b.eq    LBB77_3
LBB77_2:
    .loc    1 0 12 is_stmt 0                ; ./test/orange/orangeinp/Solid.test.cc:0:12
    mov x0, #0                          ; =0x0
LBB77_3:
    .loc    1 304 5                         ; ./test/orange/orangeinp/Solid.test.cc:304:5
    ret

Note the above code loads the vtable for the object, and simply compares the vtable address to that of the pointer. Interestingly, the vtable shows up in both the library and the test: so we conclude that the vtable for the pointer created by the library is not the same vtable known to the unit test!!

This seems to be related to a new Clang optimization, https://reviews.llvm.org/D154658 (aka https://github.com/llvm/llvm-project/commit/9d525bf94b255df89587db955b5fa2d3c03c2c3e), perhaps also the cause of https://github.com/llvm/llvm-project/issues/71196 . Adding -fno-assume-unique-vtables as documented by the new feature causes the tests to work again.

This seems to be because the class being compared is final but since it's templated, because it's instantiated in the main library but not declared external, the vtable appears both in the library and test so the address is different between the two of them. Adding an extern template declaration fixes this.

github-actions[bot] commented 1 month ago

Test summary

 3 249 files   5 047 suites   3m 42s :stopwatch:  1 520 tests  1 493 :white_check_mark: 27 :zzz: 0 :x: 16 838 runs  16 776 :white_check_mark: 62 :zzz: 0 :x:

Results for commit fb5fdb00.

esseivaju commented 1 month ago

Interesting digging on that issue. Could this originate from the fact that we had explicit instantiation definitions of these template classes without matching explicit instantiation declarations? So we'd get duplicate vtables during explicit instantiation and implicit instantiation when the template is used.

EDIT: that's what your last paragraph says, I didn't read it carefully enough 😅

sethrj commented 1 month ago

Yeah but the template functions should all be weak so I think it's legal? 😕

esseivaju commented 1 month ago

yes it should be legal. From

When the destination is a final class type that does derive from the source type, emit a direct comparison against the corresponding base class vptr value(s).

During the compilation of the test suit, they replace the dynamic_cast with a comparison of vtable pointers. During linking, the linker resolves the weak symbols, keeps only the first definition (the explicit instantiation), and discards the implicit instantiation from the test suit. So that's why the vtable pointers are not the same at runtime. I take that's what happens but I'm not sure why the linker can't get the correct vtable offset when resolving weak symbols. Maybe the code emitted after compilation uses the raw offset instead of a symbol, i.e. removing the explicit instantiation declaration and definition would also fix this (assuming there are no other implicit instantiations in the program).