RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.27k stars 1.26k forks source link

Unknown non-recurrent failures in //bindings/pydrake/systems:py/general_test #19335

Open svenevs opened 1 year ago

svenevs commented 1 year ago

First occurrence came up in continuous macOS x86:

[4:48:13 PM]  FAIL: //bindings/pydrake/systems:py/general_test (see /Users/monterey/workspace/mac-x86-monterey-clang-bazel-continuous-release/_bazel_monterey/56870061588957414ef418ee351da9fe/execroot/drake/bazel-out/darwin-opt/testlogs/bindings/pydrake/systems/py/general_test/test.log)
[4:48:13 PM]  INFO: From Testing //bindings/pydrake/systems:py/general_test:
[4:48:13 PM]  ==================== Test output for //bindings/pydrake/systems:py/general_test:
[4:48:13 PM]  
[4:48:13 PM]  Running tests...
[4:48:13 PM]  ----------------------------------------------------------------------
[4:48:13 PM]  ..............General stats regarding discrete updates:
[4:48:13 PM]  Number of time steps taken (simulator stats) = 17
[4:48:13 PM]  Simulator publishes every time step: false
[4:48:13 PM]  Number of publishes = 0
[4:48:13 PM]  Number of discrete updates = 0
[4:48:13 PM]  Number of "unrestricted" updates = 0
[4:48:13 PM]  
[4:48:13 PM]  Stats for integrator RungeKutta3Integrator with error control:
[4:48:13 PM]  Number of time steps taken (integrator stats) = 17
[4:48:13 PM]  Initial time step taken =          0 s
[4:48:13 PM]  Largest time step taken =        0.1 s
[4:48:13 PM]  Smallest adapted step size =          0 s
[4:48:13 PM]  Number of steps shrunk due to error control = 0
[4:48:13 PM]  Number of derivative evaluations = 51
[4:48:13 PM]  Number of steps shrunk due to convergence-based failure = 0
[4:48:13 PM]  Number of convergence-based step failures (should match) = 0
[4:48:13 PM]  ............F....
[4:48:13 PM]  ======================================================================
[4:48:13 PM]  FAIL [0.002s]: test_system_base_api (general_test.TestGeneral.test_system_base_api)
[4:48:13 PM]  ----------------------------------------------------------------------
[4:48:13 PM]  Traceback (most recent call last):
[4:48:13 PM]    File "/Users/monterey/workspace/mac-x86-monterey-clang-bazel-continuous-release/_bazel_monterey/56870061588957414ef418ee351da9fe/sandbox/darwin-sandbox/8448/execroot/drake/bazel-out/darwin-opt/bin/bindings/pydrake/systems/py/general_test.runfiles/drake/bindings/pydrake/systems/test/general_test.py", line 122, in test_system_base_api
[4:48:13 PM]      self.assertIs(u1.get_system(), system)
[4:48:13 PM]  AssertionError: <pydrake.systems.primitives.Adder object at 0x116343d30> is not <pydrake.systems.primitives.Adder object at 0x10eaa1330>
[4:48:13 PM]  
[4:48:13 PM]  ----------------------------------------------------------------------
[4:48:13 PM]  Ran 31 tests in 0.165s
[4:48:13 PM]  
[4:48:13 PM]  FAILED (failures=1)
[4:48:13 PM]  
[4:48:13 PM]  Generating XML reports...
[4:48:13 PM]  ================================================================================

We booted a CI machine to try and triage thinking it was related to the workspace upgrades (#19332), however it is not. It also appears under https://github.com/RobotLocomotion/drake/commit/df135679f40246495214ea5eb1edac36900589d0 but CI does not always pick it up. May not be limited to macOS.

Testing can sometimes reproduce it if you run it multiple times. In the macOS case, the command line:

$ bazel test --cache_test_results=no --runs_per_test=150 --config=clang --compilation_mode=opt --test_timeout=300,1500,4500,-1 //bindings/pydrake/systems:py/general_test

For now we label this "buildcop noise" and will log future occurrences, while silently ignoring it otherwise.

jwnimmer-tri commented 1 year ago

The first time an unexplained failure occurs, close the issue immediately – there is not much value in keeping an open issue for a failure that only ever happened once. If the issue occurs a second time, reopen it.

https://drake.mit.edu/buildcop.html#process

ggould-tri commented 1 year ago

Happened again. Also monterey. Reopening -- there's something in the monterey toolchain that's tickling pybind object identity semantics. https://drake-jenkins.csail.mit.edu/view/Production/job/mac-x86-monterey-unprovisioned-clang-bazel-nightly-release/178/consoleFull

ggould-tri commented 1 year ago

@jwnimmer-tri speculates that this may share a common source with #19394 https://drakedevelopers.slack.com/archives/C270MN28G/p1683817207919069?thread_ts=1683810179.404429&cid=C270MN28G (Mentioning this to cause the bugs to be crosslinked, for future convenience if one of them gets found and fixed)

DamrongGuoy commented 1 year ago

It happened again last night (5/16/23) in mac-arm-monterey-clang-bazel-nightly-debug/196. Repeated run mac-arm-monterey-clang-bazel-nightly-debug/197 is fine.

liangfok commented 1 year ago

Again on 05/24/23: mac-x86-monterey-unprovisioned-clang-bazel-nightly-release/201/

BetsyMcPhail commented 1 year ago

Another instance on 5/24: https://drake-jenkins.csail.mit.edu/job/mac-arm-monterey-clang-bazel-continuous-release/762/

BetsyMcPhail commented 1 year ago

5/25: https://drake-jenkins.csail.mit.edu/view/Continuous%20Production/job/mac-x86-monterey-clang-bazel-continuous-release/784/

svenevs commented 1 year ago

6/16: https://drake-jenkins.csail.mit.edu/view/Continuous%20Production/job/mac-x86-monterey-clang-bazel-continuous-release/844/consoleFull

BetsyMcPhail commented 1 year ago

6/22: https://drake-jenkins.csail.mit.edu/job/mac-arm-ventura-clang-bazel-continuous-release/120/

BetsyMcPhail commented 1 year ago

6/22: https://drake-jenkins.csail.mit.edu/job/mac-arm-monterey-clang-bazel-continuous-release/878/

BetsyMcPhail commented 1 year ago

6/26: https://drake-jenkins.csail.mit.edu/view/Nightly%20Production/job/mac-x86-monterey-unprovisioned-clang-bazel-nightly-release/236/

rpoyner-tri commented 1 year ago

7/3: https://drake-jenkins.csail.mit.edu/view/Nightly%20Production/job/mac-x86-monterey-unprovisioned-clang-bazel-nightly-release/245/

xuchenhan-tri commented 1 year ago

7/12: https://drake-jenkins.csail.mit.edu/view/Continuous%20Production/job/mac-x86-monterey-clang-bazel-continuous-release/935/

BetsyMcPhail commented 1 year ago

7/18: https://drake-jenkins.csail.mit.edu/view/Nightly%20Production/job/mac-arm-ventura-unprovisioned-clang-bazel-nightly-release/63/

DamrongGuoy commented 1 year ago

7/24: https://drake-jenkins.csail.mit.edu/view/Nightly%20Production/job/mac-arm-ventura-unprovisioned-clang-bazel-nightly-release/69/

BetsyMcPhail commented 11 months ago

10/10: https://drake-jenkins.csail.mit.edu/view/Nightly%20Production/job/mac-arm-monterey-clang-bazel-nightly-debug/346/

ggould-tri commented 10 months ago

Again: https://drake-jenkins.csail.mit.edu/view/Production/job/mac-arm-monterey-clang-bazel-nightly-debug/375/consoleFull

5:07:41 AM]  INFO: From Testing //bindings/pydrake/systems:py/general_test:
[5:07:41 AM]  ==================== Test output for //bindings/pydrake/systems:py/general_test:
[5:07:41 AM]  
[5:07:41 AM]  Running tests...
[5:07:41 AM]  ----------------------------------------------------------------------
[5:07:41 AM]  ..............General stats regarding discrete updates:
[5:07:41 AM]  Number of time steps taken (simulator stats) = 17
[5:07:41 AM]  Simulator publishes every time step: false
[5:07:41 AM]  Number of publishes = 0
[5:07:41 AM]  Number of discrete updates = 0
[5:07:41 AM]  Number of "unrestricted" updates = 0
[5:07:41 AM]  
[5:07:41 AM]  Stats for integrator RungeKutta3Integrator with error control:
[5:07:41 AM]  Number of time steps taken (integrator stats) = 17
[5:07:41 AM]  Initial time step taken =          0 s
[5:07:41 AM]  Largest time step taken =        0.1 s
[5:07:41 AM]  Smallest adapted step size =          0 s
[5:07:41 AM]  Number of steps shrunk due to error control = 0
[5:07:41 AM]  Number of derivative evaluations = 51
[5:07:41 AM]  Number of steps shrunk due to convergence-based failure = 0
[5:07:41 AM]  Number of convergence-based step failures (should match) = 0
[5:07:41 AM]  ............F....
[5:07:41 AM]  ======================================================================
[5:07:41 AM]  FAIL [0.004s]: test_system_base_api (general_test.TestGeneral.test_system_base_api)
[5:07:41 AM]  ----------------------------------------------------------------------
[5:07:41 AM]  Traceback (most recent call last):
[5:07:41 AM]    File "/Users/admin/workspace/mac-arm-monterey-clang-bazel-nightly-debug/_bazel_admin/c321c5db3bfe1b41e5a3a7639d47261f/sandbox/darwin-sandbox/8451/execroot/drake/bazel-out/darwin_arm64-dbg/bin/bindings/pydrake/systems/py/general_test.runfiles/drake/bindings/pydrake/systems/test/general_test.py", line 125, in test_system_base_api
[5:07:41 AM]      self.assertIs(u1.get_system(), system)
[5:07:41 AM]  AssertionError: <pydrake.systems.primitives.Adder object at 0x10c836db0> is not <pydrake.systems.primitives.Adder object at 0x1045fe170>
[5:07:41 AM]  
[5:07:41 AM]  ----------------------------------------------------------------------
[5:07:41 AM]  Ran 31 tests in 0.195s
BetsyMcPhail commented 10 months ago

11/9: https://drake-jenkins.csail.mit.edu/view/Production/job/mac-arm-monterey-unprovisioned-clang-bazel-nightly-release/407/

BetsyMcPhail commented 10 months ago

11/29: https://drake-jenkins.csail.mit.edu/job/mac-x86-monterey-clang-bazel-continuous-release/1366/

svenevs commented 9 months ago

Another in https://drake-jenkins.csail.mit.edu/view/Continuous%20Production/job/mac-arm-sonoma-clang-bazel-continuous-release/18/consoleFull

svenevs commented 8 months ago

2024-01-09: https://drake-jenkins.csail.mit.edu/view/Nightly%20Production/job/mac-x86-monterey-unprovisioned-clang-bazel-nightly-release/433/consoleFull

DamrongGuoy commented 8 months ago

2024-01-10 https://drake-jenkins.csail.mit.edu/view/Continuous%20Production/job/mac-x86-monterey-clang-bazel-continuous-release/1447/consoleFull

liangfok commented 8 months ago

https://drake-jenkins.csail.mit.edu/view/Nightly%20Production/job/mac-arm-ventura-clang-bazel-nightly-everything-release/258/consoleFull

liangfok commented 8 months ago

https://drake-jenkins.csail.mit.edu/view/Nightly%20Production/job/mac-x86-monterey-unprovisioned-clang-bazel-nightly-release/454/consoleFull

SeanCurtis-TRI commented 8 months ago

Again

https://drake-jenkins.csail.mit.edu/view/Continuous%20Production/job/mac-x86-monterey-clang-bazel-continuous-release/1495/

svenevs commented 7 months ago

Again in https://drake-jenkins.csail.mit.edu/view/Nightly%20Production/job/mac-x86-monterey-clang-bazel-nightly-everything-release/55/consoleFull

DamrongGuoy commented 7 months ago

https://drake-jenkins.csail.mit.edu/job/mac-arm-ventura-clang-bazel-experimental-release/4678/consoleFull

jwnimmer-tri commented 7 months ago

The test has a runtime of like 2 seconds. We should mark it flaky = True to have it be slightly less noisy in CI.

ggould-tri commented 7 months ago

https://drake-jenkins.csail.mit.edu/view/Production/job/mac-arm-ventura-clang-bazel-continuous-release/268/consoleFull

BetsyMcPhail commented 7 months ago

2/26: https://drake-jenkins.csail.mit.edu/job/mac-arm-ventura-clang-bazel-continuous-release/289/

SeanCurtis-TRI commented 7 months ago

Another instance, this time on sonoma https://drake-jenkins.csail.mit.edu/view/Nightly%20Production/job/mac-arm-sonoma-unprovisioned-clang-bazel-nightly-release/112/

svenevs commented 6 months ago

03-20-2024: https://drake-jenkins.csail.mit.edu/view/Nightly%20Production/job/mac-arm-ventura-clang-bazel-nightly-everything-release/315/consoleFull

BetsyMcPhail commented 6 months ago

3/28/24: https://drake-jenkins.csail.mit.edu/job/mac-arm-sonoma-clang-bazel-continuous-release/296/

BetsyMcPhail commented 5 months ago

4/8: https://drake-jenkins.csail.mit.edu/view/Continuous%20Production/job/mac-arm-ventura-clang-bazel-continuous-release/412/

williamjallen commented 1 week ago

9/23: https://drake-jenkins.csail.mit.edu/job/mac-arm-ventura-clang-bazel-continuous-release/818/consoleFull