RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.35k stars 1.27k forks source link

xcode 16 + python + multiple shared libraries + dynamic_cast ==> fail #22204

Open rpoyner-tri opened 13 hours ago

rpoyner-tri commented 13 hours ago

What happened?

On macos/xcode 16:

$ bazel test //bindings/pydrake/systems:py/custom_test

fails with a std::bad_cast exception. Similarly

A full CI build log: https://drake-jenkins.csail.mit.edu/view/Mac%20Sequoia%20Unprovisioned/job/mac-arm-sequoia-unprovisioned-clang-bazel-experimental-release/13/consoleFull

Version

master circa 1.35

What operating system are you using?

macOS 14 (Sonoma)

What installation option are you using?

compiled from source code using Bazel

Relevant log output

No response

rpoyner-tri commented 12 hours ago

On my dev branch, with extra instrumentation, we can see that there are two addresses that contain the same type descriptor:

ricopoyner@TRI-X9DWTVD9TR drake % bazel test //bindings/pydrake/systems:py/custom_test
INFO: Analyzed target //bindings/pydrake/systems:py/custom_test (1 packages loaded, 16 targets configured).
INFO: From Linking bindings/pydrake/systems/test/test_util.cpython-312-darwin.so:
ld: warning: duplicate -rpath '/opt/homebrew/Cellar/fmt/11.0.2/lib' ignored
FAIL: //bindings/pydrake/systems:py/custom_test (see /private/var/tmp/_bazel_ricopoyner/27b47a6d9b400570878eb2115555e985/execroot/drake/bazel-out/darwin_arm64-opt/testlogs/bindings/pydrake/systems/py/custom_test/test.log)
INFO: From Testing //bindings/pydrake/systems:py/custom_test:
==================== Test output for //bindings/pydrake/systems:py/custom_test:

Running tests...
----------------------------------------------------------------------
....E.............
======================================================================
ERROR [0.004s]: test_all_leaf_system_overrides (custom_test.TestCustom.test_all_leaf_system_overrides)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/private/var/tmp/_bazel_ricopoyner/27b47a6d9b400570878eb2115555e985/sandbox/darwin-sandbox/194/execroot/drake/bazel-out/darwin_arm64-opt/bin/bindings/pydrake/systems/py/custom_test.runfiles/drake/bindings/pydrake/systems/test/custom_test.py", line 584, in test_all_leaf_system_overrides
    results = call_leaf_system_overrides(system)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: is_dynamic_castable<drake::systems::LeafEventCollection<drake::systems::PublishEvent<double>>@0x109bd91c0>(drake::systems::EventCollection<drake::systems::PublishEvent<double>>* ptr) failed because ptr is of dynamic type drake::systems::LeafEventCollection<drake::systems::PublishEvent<double>>@0x1039d09e0.

----------------------------------------------------------------------
Ran 18 tests in 0.030s

FAILED (errors=1)

Generating XML reports...
================================================================================
INFO: Found 1 test target...
Target //bindings/pydrake/systems:py/custom_test up-to-date:
  bazel-bin/bindings/pydrake/systems/py/custom_test
INFO: Elapsed time: 1.899s, Critical Path: 1.54s
INFO: 4 processes: 2 internal, 2 darwin-sandbox.
INFO: Build completed, 1 test FAILED, 4 total actions
//bindings/pydrake/systems:py/custom_test                                FAILED in 0.7s
  /private/var/tmp/_bazel_ricopoyner/27b47a6d9b400570878eb2115555e985/execroot/drake/bazel-out/darwin_arm64-opt/testlogs/bindings/pydrake/systems/py/custom_test/test.log

Executed 1 out of 1 test: 1 fails locally.

They are from two shared libraries: libdrake.so and bindings/pydrake/systems/test/test_util.cpython-312-darwin.so. This situation is no different than before, but with xcode 15 the tests passed. I believe that older implementations of dynamic_cast would use (or fall back to) type string comparison if the addresses did not match. This appears to be no longer the case.

I've tried a lot of voodoo recommended by the interwebs (RTLD_GLOBAL, clang type_visibility attribute, ld -flat_namespace, etc.) to no avail. I suspect our choices boil down to: