Open svenevs opened 1 year ago
The first time an unexplained failure occurs, close the issue immediately – there is not much value in keeping an open issue for a failure that only ever happened once. If the issue occurs a second time, reopen it.
Happened again. Also monterey. Reopening -- there's something in the monterey toolchain that's tickling pybind object identity semantics. https://drake-jenkins.csail.mit.edu/view/Production/job/mac-x86-monterey-unprovisioned-clang-bazel-nightly-release/178/consoleFull
@jwnimmer-tri speculates that this may share a common source with #19394 https://drakedevelopers.slack.com/archives/C270MN28G/p1683817207919069?thread_ts=1683810179.404429&cid=C270MN28G (Mentioning this to cause the bugs to be crosslinked, for future convenience if one of them gets found and fixed)
It happened again last night (5/16/23) in mac-arm-monterey-clang-bazel-nightly-debug/196. Repeated run mac-arm-monterey-clang-bazel-nightly-debug/197 is fine.
Again on 05/24/23: mac-x86-monterey-unprovisioned-clang-bazel-nightly-release/201/
Another instance on 5/24: https://drake-jenkins.csail.mit.edu/job/mac-arm-monterey-clang-bazel-continuous-release/762/
5:07:41 AM] INFO: From Testing //bindings/pydrake/systems:py/general_test:
[5:07:41 AM] ==================== Test output for //bindings/pydrake/systems:py/general_test:
[5:07:41 AM]
[5:07:41 AM] Running tests...
[5:07:41 AM] ----------------------------------------------------------------------
[5:07:41 AM] ..............General stats regarding discrete updates:
[5:07:41 AM] Number of time steps taken (simulator stats) = 17
[5:07:41 AM] Simulator publishes every time step: false
[5:07:41 AM] Number of publishes = 0
[5:07:41 AM] Number of discrete updates = 0
[5:07:41 AM] Number of "unrestricted" updates = 0
[5:07:41 AM]
[5:07:41 AM] Stats for integrator RungeKutta3Integrator with error control:
[5:07:41 AM] Number of time steps taken (integrator stats) = 17
[5:07:41 AM] Initial time step taken = 0 s
[5:07:41 AM] Largest time step taken = 0.1 s
[5:07:41 AM] Smallest adapted step size = 0 s
[5:07:41 AM] Number of steps shrunk due to error control = 0
[5:07:41 AM] Number of derivative evaluations = 51
[5:07:41 AM] Number of steps shrunk due to convergence-based failure = 0
[5:07:41 AM] Number of convergence-based step failures (should match) = 0
[5:07:41 AM] ............F....
[5:07:41 AM] ======================================================================
[5:07:41 AM] FAIL [0.004s]: test_system_base_api (general_test.TestGeneral.test_system_base_api)
[5:07:41 AM] ----------------------------------------------------------------------
[5:07:41 AM] Traceback (most recent call last):
[5:07:41 AM] File "/Users/admin/workspace/mac-arm-monterey-clang-bazel-nightly-debug/_bazel_admin/c321c5db3bfe1b41e5a3a7639d47261f/sandbox/darwin-sandbox/8451/execroot/drake/bazel-out/darwin_arm64-dbg/bin/bindings/pydrake/systems/py/general_test.runfiles/drake/bindings/pydrake/systems/test/general_test.py", line 125, in test_system_base_api
[5:07:41 AM] self.assertIs(u1.get_system(), system)
[5:07:41 AM] AssertionError: <pydrake.systems.primitives.Adder object at 0x10c836db0> is not <pydrake.systems.primitives.Adder object at 0x1045fe170>
[5:07:41 AM]
[5:07:41 AM] ----------------------------------------------------------------------
[5:07:41 AM] Ran 31 tests in 0.195s
The test has a runtime of like 2 seconds. We should mark it flaky = True
to have it be slightly less noisy in CI.
Another instance, this time on sonoma https://drake-jenkins.csail.mit.edu/view/Nightly%20Production/job/mac-arm-sonoma-unprovisioned-clang-bazel-nightly-release/112/
First occurrence came up in continuous macOS x86:
We booted a CI machine to try and triage thinking it was related to the workspace upgrades (#19332), however it is not. It also appears under https://github.com/RobotLocomotion/drake/commit/df135679f40246495214ea5eb1edac36900589d0 but CI does not always pick it up. May not be limited to macOS.
Testing can sometimes reproduce it if you run it multiple times. In the macOS case, the command line:
For now we label this "buildcop noise" and will log future occurrences, while silently ignoring it otherwise.