RobotLocomotion / drake

Model-based design and verification for robotics.
https://drake.mit.edu
Other
3.34k stars 1.26k forks source link

macOS solver failures related to OpenBLAS 0.3.26 #20799

Closed svenevs closed 5 months ago

svenevs commented 9 months ago

Failures on //geometry/optimization:iris_test and //examples/pendulum:trajectory_optimization_simulation_test via mac-arm-sonoma-unprovisioned-clang-bazel-nightly-release/71 and mac-arm-ventura-unprovisioned-clang-bazel-nightly-release/255 appear to be coming from a brew update of OpenBLAS from 0.3.25 to 0.3.26.

First step is to confirm the failures locally (@svenevs).

Slack discussion here.

Steps to obtain 0.3.25 if desired:

  1. Get a clean brew installation of drake and run setup/mac/install_prereqs.sh.
  2. Now, uninstall the dependents: brew uninstall ipopt numpy
  3. Uninstall the affected openblas: brew uninstall openblas
  4. Download the old openblas 0.3.25 and install it:

    $ curl -LO https://github.com/Homebrew/homebrew-core/blob/0ccffdb71b99b3021449ff8e34442d3674b30c91/Formula/o/openblas.rb
    $ brew install ./openblas.rb
  5. Reinstall the dependents, do not allow it to update: HOMEBREW_NO_AUTO_UPDATE=1 brew install ipopt numpy

CC @jwnimmer-tri @BetsyMcPhail

svenevs commented 9 months ago

Status confirmed, bazel test //examples/pendulum:trajectory_optimization_simulation_test fails with openblas==0.3.e6, and when installing openblas==0.3.25 the test succeeds again.

svenevs commented 9 months ago

Band-aids available if we want to keep the tests enabled are

  1. Rebottle 0.3.25 and put it in homebrew-director (I don't think I even need to build it, just download the ones we support).
  2. Since the setup/ scripts are shell scripts, we could theoretically download the old file and install the .rb file directly.

Unfortunately, this is not allowed

diff --git a/setup/mac/binary_distribution/Brewfile b/setup/mac/binary_distribution/Brewfile
index 1d9eb86c63..edf77f5a94 100644
--- a/setup/mac/binary_distribution/Brewfile
+++ b/setup/mac/binary_distribution/Brewfile
@@ -13,7 +13,7 @@ brew 'glib'
 brew 'graphviz'
 brew 'ipopt'
 brew 'numpy'
-brew 'openblas'
+brew 'https://github.com/Homebrew/homebrew-core/raw/0ccffdb71b99b3021449ff8e34442d3674b30c91/Formula/o/openblas.rb'
 brew 'pkg-config'
 brew 'python@3.11'
 brew 'spdlog'

and never will be https://github.com/Homebrew/brew/issues/15496

jwnimmer-tri commented 9 months ago

There's a call out on Slack for someone to take over this issue. I'm waiting to see if anyone steps up.

jwnimmer-tri commented 9 months ago

Here's a more thorough failure trace. I believe it's the solve step being identically zero that triggers this panic.

Verbosity patch:

--- a/solvers/clarabel_solver.cc
+++ b/solvers/clarabel_solver.cc
@@ -154,7 +154,7 @@ class SettingsConverter {

   explicit SettingsConverter(const SolverOptions& solver_options) {
     // Propagate Drake's common options into `settings_`.
-    settings_.verbose = solver_options.get_print_to_console();
+    settings_.verbose = true;
     // TODO(jwnimmer-tri) Handle get_print_file_name().

     // Copy the Clarabel-specific `solver_options` to pending maps.

Command line:

bazel test //geometry/optimization:iris_test --nocache_test_results --test_env=RUST_BACKTRACE=1 --test_arg=--gtest_filter=IrisTest.ClosestPointFailure --test_arg=--spdlog_level=debug -c dbg

Output:

[2024-01-23 02:12:53.221] [console] [debug] solvers::Solve will use Clarabel
-------------------------------------------------------------
           Clarabel.rs v0.6.0  -  Clever Acronym              

                   (c) Paul Goulart                          
                University of Oxford, 2022                   
-------------------------------------------------------------

problem:
  variables     = 20
  constraints   = 47
  nnz(P)        = 0
  nnz(A)        = 62
  cones (total) = 12
    :        Zero = 1,  numel = 10
    : Nonnegative = 1,  numel = 0
    : SecondOrder = 7,  numel = (3,3,3,3,...,3)
    : Exponential = 2,  numel = (3,3)
    : PSDTriangle = 1,  numel = 10

settings:
  linear algebra: direct / qdldl, precision: 64 bit
  max iter = 200, time limit = Inf,  max step = 0.990
  tol_feas = 1.0e-8, tol_gap_abs = 1.0e-8, tol_gap_rel = 1.0e-8,
  static reg : on, ϵ1 = 1.0e-8, ϵ2 = 4.9e-32
  dynamic reg: on, ϵ = 1.0e-13, δ = 2.0e-7
  iter refine: on, reltol = 1.0e-13, abstol = 1.0e-12,
               max iter = 10, stop ratio = 5.0
  equilibrate: on, min_scale = 1.0e-4, max_scale = 1.0e4
               max iter = 10

iter    pcost        dcost       gap       pres      dres      k/t        μ       step      
---------------------------------------------------------------------------------------------
  0  +0.0000e+00  -4.2824e+01  4.28e+01  1.29e+00  5.24e-01  1.00e+00  1.00e+00   ------   
  1  +3.3643e+00  -2.7435e+01  9.15e+00  6.11e-01  1.58e-01  2.25e+00  2.87e-01  7.45e-01  
  2  +2.1775e+00  -9.2390e+00  5.24e+00  1.59e-01  4.35e-02  1.03e+00  7.89e-02  8.06e-01  
  3  +1.8464e-01  -1.6841e+00  1.87e+00  2.10e-02  6.15e-03  1.60e-01  1.19e-02  8.59e-01  
  4  -2.3063e-01  -8.0150e-01  5.71e-01  5.86e-03  1.67e-03  4.86e-02  3.46e-03  7.84e-01  
  5  -3.7020e-01  -5.0332e-01  1.33e-01  1.33e-03  3.79e-04  1.16e-02  8.00e-04  7.84e-01  
  6  -4.0907e-01  -4.3915e-01  3.01e-02  2.98e-04  8.42e-05  2.66e-03  1.80e-04  7.84e-01  
  7  -4.2013e-01  -4.2099e-01  8.61e-04  8.57e-06  2.41e-06  8.29e-05  5.16e-06  9.80e-01  
  8  -4.2044e-01  -4.2047e-01  2.48e-05  2.47e-07  6.93e-08  2.47e-06  1.49e-07  9.80e-01  
  9  -4.2045e-01  -4.2045e-01  1.51e-06  1.50e-08  4.22e-09  1.53e-07  9.06e-09  9.58e-01  
 10  -4.2045e-01  -4.2045e-01  1.46e-07  1.46e-09  4.08e-10  1.48e-08  8.77e-10  9.07e-01  
 11  -4.2045e-01  -4.2045e-01  2.18e-08  2.18e-10  6.11e-11  2.22e-09  1.31e-10  8.51e-01  
 12  -4.2045e-01  -4.2045e-01  2.18e-08  2.18e-10  6.11e-11  2.22e-09  1.31e-10  0.00e+00  
thread '<unnamed>' panicked at external/crate__clarabel-0.6.0/src/solver/core/cones/psdtrianglecone.rs:292:9:
not implemented: Mixed PSD and Exponential/Power cones are not yet supported
stack backtrace:
   0: rust_begin_unwind
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:645:5
   1: core::panicking::panic_fmt
             at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panicking.rs:72:14
   2: <clarabel::solver::core::cones::psdtrianglecone::PSDTriangleCone<T> as clarabel::solver::core::cones::Cone<T>>::compute_barrier
             at external/crate__clarabel-0.6.0/src/solver/core/cones/psdtrianglecone.rs:292:9
   3: <clarabel::solver::core::cones::supportedcone::SupportedCone<T> as clarabel::solver::core::cones::Cone<T>>::compute_barrier
             at external/crate__clarabel-0.6.0/src/solver/core/cones/mod.rs:42:1
   4: <clarabel::solver::core::cones::compositecone::CompositeCone<T> as clarabel::solver::core::cones::Cone<T>>::compute_barrier
             at external/crate__clarabel-0.6.0/src/solver/core/cones/compositecone.rs:363:24
   5: <clarabel::solver::implementations::default::variables::DefaultVariables<T> as clarabel::solver::core::traits::Variables<T>>::barrier
             at external/crate__clarabel-0.6.0/src/solver/implementations/default/variables.rs:213:20
   6: <clarabel::solver::core::solver::Solver<D,V,R,K,C,I,SO,SE> as clarabel::solver::core::solver::internal::IPSolverInternals<T,D,V,R,K,C,I,SO,SE>>::backtrack_step_to_barrier
             at external/crate__clarabel-0.6.0/src/solver/core/solver.rs:483:31
   7: <clarabel::solver::core::solver::Solver<D,V,R,K,C,I,SO,SE> as clarabel::solver::core::solver::internal::IPSolverInternals<T,D,V,R,K,C,I,SO,SE>>::get_step_length
             at external/crate__clarabel-0.6.0/src/solver/core/solver.rs:473:22
   8: <clarabel::solver::core::solver::Solver<D,V,R,K,C,I,SO,SE> as clarabel::solver::core::solver::IPSolver<T,D,V,R,K,C,I,SO,SE>>::solve
             at external/crate__clarabel-0.6.0/src/solver/core/solver.rs:330:18
   9: clarabel_cpp_rust_wrapper::solver::implementations::default::solver::_internal_DefaultSolver_solve
             at external/clarabel_cpp_internal/rust_wrapper/src/solver/implementations/default/solver.rs:109:5
  10: clarabel_DefaultSolver_f64_solve

Here's the output with Netlib BLAS (no panic):

[2024-01-23 02:18:53.708] [console] [debug] solvers::Solve will use Clarabel
-------------------------------------------------------------
           Clarabel.rs v0.6.0  -  Clever Acronym              

                   (c) Paul Goulart                          
                University of Oxford, 2022                   
-------------------------------------------------------------

problem:
  variables     = 20
  constraints   = 47
  nnz(P)        = 0
  nnz(A)        = 62
  cones (total) = 12
    :        Zero = 1,  numel = 10
    : Nonnegative = 1,  numel = 0
    : SecondOrder = 7,  numel = (3,3,3,3,...,3)
    : Exponential = 2,  numel = (3,3)
    : PSDTriangle = 1,  numel = 10

settings:
  linear algebra: direct / qdldl, precision: 64 bit
  max iter = 200, time limit = Inf,  max step = 0.990
  tol_feas = 1.0e-8, tol_gap_abs = 1.0e-8, tol_gap_rel = 1.0e-8,
  static reg : on, ϵ1 = 1.0e-8, ϵ2 = 4.9e-32
  dynamic reg: on, ϵ = 1.0e-13, δ = 2.0e-7
  iter refine: on, reltol = 1.0e-13, abstol = 1.0e-12,
               max iter = 10, stop ratio = 5.0
  equilibrate: on, min_scale = 1.0e-4, max_scale = 1.0e4
               max iter = 10

iter    pcost        dcost       gap       pres      dres      k/t        μ       step      
---------------------------------------------------------------------------------------------
  0  +0.0000e+00  -4.2824e+01  4.28e+01  1.29e+00  5.24e-01  1.00e+00  1.00e+00   ------   
  1  +3.3643e+00  -2.7435e+01  9.15e+00  6.11e-01  1.58e-01  2.25e+00  2.87e-01  7.45e-01  
  2  +2.1775e+00  -9.2391e+00  5.24e+00  1.59e-01  4.35e-02  1.03e+00  7.89e-02  8.06e-01  
  3  +1.8465e-01  -1.6842e+00  1.87e+00  2.10e-02  6.15e-03  1.60e-01  1.19e-02  8.59e-01  
  4  -2.3066e-01  -8.0154e-01  5.71e-01  5.86e-03  1.67e-03  4.86e-02  3.46e-03  7.84e-01  
  5  -3.7021e-01  -5.0333e-01  1.33e-01  1.33e-03  3.79e-04  1.16e-02  8.00e-04  7.84e-01  
  6  -4.0908e-01  -4.3916e-01  3.01e-02  2.98e-04  8.42e-05  2.66e-03  1.80e-04  7.84e-01  
  7  -4.2014e-01  -4.2100e-01  8.61e-04  8.57e-06  2.41e-06  8.29e-05  5.16e-06  9.80e-01  
  8  -4.2045e-01  -4.2048e-01  2.48e-05  2.47e-07  6.93e-08  2.47e-06  1.49e-07  9.80e-01  
  9  -4.2046e-01  -4.2046e-01  1.51e-06  1.51e-08  4.22e-09  1.53e-07  9.07e-09  9.58e-01  
 10  -4.2046e-01  -4.2046e-01  4.18e-07  4.18e-09  1.17e-09  4.25e-08  2.52e-09  7.25e-01  
 11  -4.2046e-01  -4.2046e-01  3.13e-08  3.13e-10  8.77e-11  3.19e-09  1.89e-10  9.25e-01  
 12  -4.2046e-01  -4.2046e-01  1.36e-08  1.37e-10  3.92e-11  1.43e-09  8.33e-11  9.80e-01  
 13  -4.2046e-01  -4.2046e-01  2.97e-09  2.99e-11  8.57e-12  3.12e-10  1.83e-11  7.84e-01  
---------------------------------------------------------------------------------------------
Terminated with status = Solved
solve time = 7.190811ms
ggould-tri commented 9 months ago

Again: