argmin-rs / argmin

Numerical optimization in pure Rust
http://argmin-rs.org
Apache License 2.0
973 stars 76 forks source link

Benchmark L-BFGS #10

Open rth opened 5 years ago

rth commented 5 years ago

Thanks for this very nice package!

On the general topic of benchmarks, I would be quite interested to know how the L-BFGS implementation in argmin would compare to,

There are Python wrappers for these, in PyLBFGS and scipy.optimize. So using pyargmin it should be relatively easy to compare them (albeit with the Python overhead).

L-BGFS is very commonly used in machine learning (also see the "Rust needs BFGS" blog by the author of the bfgs crate). I'm mostly interested in seeing of it's possible to get a comparable or faster implementation in Rust than the one in scipy.optimize, that would be more readable (and support both f64 and f32).

Opening this issue mostly to report progress on it. If you have any suggestions (e.g. on the benchmark cases to use) I would be interested as well.

stefan-k commented 5 years ago

Such a benchmark would be really interesting. I haven't compared against these implementations at all, so besides the performance differences, it may be possible that they do not produce the exact same results. In terms of performance, I suspect (or better, hope) that most time is spent on linear algebra. I think flamegraph could be a very useful tool for this. I will see if I can find problems for the benchmark (which is something I wanted to do for a long time anyway).

nilgoyette commented 4 years ago

I tested this today. I'm using the original Fortran implementation v3. My application runs in

I really wanted to test argmin-rs because I want to get rid of Fortran in my perfect Rust codebase :D I changed only the L-BFGS part, from the original version to the argmin-rs version. My application now runs in 1m09s. I'm aware that this is not a benchmark at all! Still, now I don't need to code one as I have a ballpark of what I would gain (or lose). Of course, this slowdown is unacceptable, so I won't use this version. On the plus side though

This makes me wonder what tricks they used in the original version. It's obviously optimized by a group of genius mathematicians :P If I test the C port, I'll tell update this page.

rth commented 4 years ago

Thanks for the feedback @nilgoyette ! I also did some benchmarks recently between pyargmin and scipy for L-BFGS-B on the logistic regression problem, am also getting comparable results (i.e. pyargmin is ~4x slower). The WIP comparison notebook (in Python) can be found here.

I think there are two directions to explore to improve this situation,

rth commented 4 years ago

To have a starting point for this discussion about optimization, here are results of,

cargo flamegraph --example lbfgs --features ndarrayl

with a modified lbfgs example to increase problem dimensionality (and run longer),

```diff diff --git a/Cargo.toml b/Cargo.toml index e4c176fd..9cce01f1 100644 --- a/Cargo.toml +++ b/Cargo.toml @@ -54,3 +54,6 @@ visualizer = ["gnuplot"] [badges] travis-ci = { repository = "argmin-rs/argmin", branch = "master" } maintenance = { status = "actively-developed" } + +[profile.release] +debug = true diff --git a/examples/lbfgs.rs b/examples/lbfgs.rs index c22fa30f..5ca625c9 100644 --- a/examples/lbfgs.rs +++ b/examples/lbfgs.rs @@ -42,19 +42,272 @@ fn run() -> Result<(), Error> { let cost = Rosenbrock { a: 1.0, b: 100.0 }; // Define initial parameter vector - let init_param: Array1 = array![-1.2, 1.0]; - // let init_param: Array1 = array![-1.2, 1.0, -10.0, 2.0, 3.0, 2.0, 4.0, 10.0]; + // Output of + // >>> import numpy as np + //>>> np.random.RandomState(0).rand(1000) + let init_param: Array1 = array![ + 5.48813504e-01, 7.15189366e-01, 6.02763376e-01, 5.44883183e-01, + 4.23654799e-01, 6.45894113e-01, 4.37587211e-01, 8.91773001e-01, + 9.63662761e-01, 3.83441519e-01, 7.91725038e-01, 5.28894920e-01, + 5.68044561e-01, 9.25596638e-01, 7.10360582e-02, 8.71292997e-02, + 2.02183974e-02, 8.32619846e-01, 7.78156751e-01, 8.70012148e-01, + 9.78618342e-01, 7.99158564e-01, 4.61479362e-01, 7.80529176e-01, + 1.18274426e-01, 6.39921021e-01, 1.43353287e-01, 9.44668917e-01, + 5.21848322e-01, 4.14661940e-01, 2.64555612e-01, 7.74233689e-01, + 4.56150332e-01, 5.68433949e-01, 1.87898004e-02, 6.17635497e-01, + 6.12095723e-01, 6.16933997e-01, 9.43748079e-01, 6.81820299e-01, + 3.59507901e-01, 4.37031954e-01, 6.97631196e-01, 6.02254716e-02, + 6.66766715e-01, 6.70637870e-01, 2.10382561e-01, 1.28926298e-01, + 3.15428351e-01, 3.63710771e-01, 5.70196770e-01, 4.38601513e-01, + 9.88373838e-01, 1.02044811e-01, 2.08876756e-01, 1.61309518e-01, + 6.53108325e-01, 2.53291603e-01, 4.66310773e-01, 2.44425592e-01, + 1.58969584e-01, 1.10375141e-01, 6.56329589e-01, 1.38182951e-01, + 1.96582362e-01, 3.68725171e-01, 8.20993230e-01, 9.71012758e-02, + 8.37944907e-01, 9.60984079e-02, 9.76459465e-01, 4.68651202e-01, + 9.76761088e-01, 6.04845520e-01, 7.39263579e-01, 3.91877923e-02, + 2.82806963e-01, 1.20196561e-01, 2.96140198e-01, 1.18727719e-01, + 3.17983179e-01, 4.14262995e-01, 6.41474963e-02, 6.92472119e-01, + 5.66601454e-01, 2.65389491e-01, 5.23248053e-01, 9.39405108e-02, + 5.75946496e-01, 9.29296198e-01, 3.18568952e-01, 6.67410380e-01, + 1.31797862e-01, 7.16327204e-01, 2.89406093e-01, 1.83191362e-01, + 5.86512935e-01, 2.01075462e-02, 8.28940029e-01, 4.69547619e-03, + 6.77816537e-01, 2.70007973e-01, 7.35194022e-01, 9.62188545e-01, + 2.48753144e-01, 5.76157334e-01, 5.92041931e-01, 5.72251906e-01, + 2.23081633e-01, 9.52749012e-01, 4.47125379e-01, 8.46408672e-01, + 6.99479275e-01, 2.97436951e-01, 8.13797820e-01, 3.96505741e-01, + 8.81103197e-01, 5.81272873e-01, 8.81735362e-01, 6.92531590e-01, + 7.25254280e-01, 5.01324382e-01, 9.56083635e-01, 6.43990199e-01, + 4.23855049e-01, 6.06393214e-01, 1.91931983e-02, 3.01574817e-01, + 6.60173537e-01, 2.90077607e-01, 6.18015429e-01, 4.28768701e-01, + 1.35474064e-01, 2.98282326e-01, 5.69964911e-01, 5.90872761e-01, + 5.74325249e-01, 6.53200820e-01, 6.52103270e-01, 4.31418435e-01, + 8.96546596e-01, 3.67561870e-01, 4.35864925e-01, 8.91923355e-01, + 8.06193989e-01, 7.03888584e-01, 1.00226887e-01, 9.19482614e-01, + 7.14241300e-01, 9.98847007e-01, 1.49448305e-01, 8.68126057e-01, + 1.62492935e-01, 6.15559564e-01, 1.23819983e-01, 8.48008229e-01, + 8.07318959e-01, 5.69100739e-01, 4.07183297e-01, 6.91669955e-02, + 6.97428773e-01, 4.53542683e-01, 7.22055599e-01, 8.66382326e-01, + 9.75521505e-01, 8.55803342e-01, 1.17140842e-02, 3.59978064e-01, + 7.29990562e-01, 1.71629677e-01, 5.21036606e-01, 5.43379883e-02, + 1.99996525e-01, 1.85217945e-02, 7.93697703e-01, 2.23924688e-01, + 3.45351681e-01, 9.28081293e-01, 7.04414402e-01, 3.18389295e-02, + 1.64694156e-01, 6.21478401e-01, 5.77228589e-01, 2.37892821e-01, + 9.34213998e-01, 6.13965956e-01, 5.35632803e-01, 5.89909976e-01, + 7.30122030e-01, 3.11944995e-01, 3.98221062e-01, 2.09843749e-01, + 1.86193006e-01, 9.44372390e-01, 7.39550795e-01, 4.90458809e-01, + 2.27414628e-01, 2.54356482e-01, 5.80291603e-02, 4.34416626e-01, + 3.11795882e-01, 6.96343489e-01, 3.77751839e-01, 1.79603678e-01, + 2.46787284e-02, 6.72496315e-02, 6.79392773e-01, 4.53696845e-01, + 5.36579211e-01, 8.96671293e-01, 9.90338947e-01, 2.16896984e-01, + 6.63078203e-01, 2.63322377e-01, 2.06509995e-02, 7.58378654e-01, + 3.20017151e-01, 3.83463894e-01, 5.88317114e-01, 8.31048455e-01, + 6.28981844e-01, 8.72650655e-01, 2.73542035e-01, 7.98046834e-01, + 1.85635944e-01, 9.52791657e-01, 6.87488276e-01, 2.15507677e-01, + 9.47370590e-01, 7.30855807e-01, 2.53941643e-01, 2.13311977e-01, + 5.18200714e-01, 2.56627181e-02, 2.07470075e-01, 4.24685469e-01, + 3.74169980e-01, 4.63575424e-01, 2.77628706e-01, 5.86784346e-01, + 8.63855606e-01, 1.17531856e-01, 5.17379107e-01, 1.32068106e-01, + 7.16859681e-01, 3.96059703e-01, 5.65421312e-01, 1.83279836e-01, + 1.44847759e-01, 4.88056281e-01, 3.55612738e-01, 9.40431945e-01, + 7.65325254e-01, 7.48663620e-01, 9.03719740e-01, 8.34224354e-02, + 5.52192470e-01, 5.84476069e-01, 9.61936379e-01, 2.92147527e-01, + 2.40828780e-01, 1.00293942e-01, 1.64296296e-02, 9.29529317e-01, + 6.69916547e-01, 7.85152912e-01, 2.81730106e-01, 5.86410166e-01, + 6.39552661e-02, 4.85627596e-01, 9.77495140e-01, 8.76505245e-01, + 3.38158952e-01, 9.61570155e-01, 2.31701626e-01, 9.49318822e-01, + 9.41377705e-01, 7.99202587e-01, 6.30447937e-01, 8.74287967e-01, + 2.93020285e-01, 8.48943555e-01, 6.17876692e-01, 1.32368578e-02, + 3.47233518e-01, 1.48140861e-01, 9.81829390e-01, 4.78370307e-01, + 4.97391365e-01, 6.39472516e-01, 3.68584606e-01, 1.36900272e-01, + 8.22117733e-01, 1.89847912e-01, 5.11318983e-01, 2.24317029e-01, + 9.78444845e-02, 8.62191517e-01, 9.72919489e-01, 9.60834658e-01, + 9.06555499e-01, 7.74047333e-01, 3.33145152e-01, 8.11013900e-02, + 4.07241171e-01, 2.32234142e-01, 1.32487635e-01, 5.34271818e-02, + 7.25594364e-01, 1.14274586e-02, 7.70580749e-01, 1.46946645e-01, + 7.95220826e-02, 8.96030342e-02, 6.72047807e-01, 2.45367210e-01, + 4.20539467e-01, 5.57368791e-01, 8.60551174e-01, 7.27044263e-01, + 2.70327905e-01, 1.31482799e-01, 5.53743204e-02, 3.01598634e-01, + 2.62118149e-01, 4.56140567e-01, 6.83281336e-01, 6.95625446e-01, + 2.83518847e-01, 3.79926956e-01, 1.81150962e-01, 7.88545512e-01, + 5.68480764e-02, 6.96997242e-01, 7.78695396e-01, 7.77407562e-01, + 2.59422564e-01, 3.73813138e-01, 5.87599635e-01, 2.72821902e-01, + 3.70852799e-01, 1.97054280e-01, 4.59855884e-01, 4.46123013e-02, + 7.99795885e-01, 7.69564470e-02, 5.18835149e-01, 3.06810100e-01, + 5.77542949e-01, 9.59433341e-01, 6.45570244e-01, 3.53624358e-02, + 4.30402440e-01, 5.10016852e-01, 5.36177495e-01, 6.81392511e-01, + 2.77596098e-01, 1.28860565e-01, 3.92675677e-01, 9.56405723e-01, + 1.87130892e-01, 9.03983955e-01, 5.43805950e-01, 4.56911422e-01, + 8.82041410e-01, 4.58603962e-01, 7.24167637e-01, 3.99025322e-01, + 9.04044393e-01, 6.90025020e-01, 6.99622054e-01, 3.27720402e-01, + 7.56778643e-01, 6.36061055e-01, 2.40020273e-01, 1.60538822e-01, + 7.96391475e-01, 9.59166603e-01, 4.58138827e-01, 5.90984165e-01, + 8.57722644e-01, 4.57223453e-01, 9.51874477e-01, 5.75751162e-01, + 8.20767121e-01, 9.08843718e-01, 8.15523819e-01, 1.59414463e-01, + 6.28898439e-01, 3.98434259e-01, 6.27129520e-02, 4.24032252e-01, + 2.58684067e-01, 8.49038308e-01, 3.33046265e-02, 9.58982722e-01, + 3.55368848e-01, 3.56706890e-01, 1.63285027e-02, 1.85232325e-01, + 4.01259501e-01, 9.29291417e-01, 9.96149302e-02, 9.45301533e-01, + 8.69488531e-01, 4.54162397e-01, 3.26700882e-01, 2.32744129e-01, + 6.14464706e-01, 3.30745915e-02, 1.56060644e-02, 4.28795722e-01, + 6.80740740e-02, 2.51940988e-01, 2.21160915e-01, 2.53191194e-01, + 1.31055231e-01, 1.20362229e-02, 1.15484297e-01, 6.18480260e-01, + 9.74256213e-01, 9.90345002e-01, 4.09054095e-01, 1.62954426e-01, + 6.38761757e-01, 4.90305347e-01, 9.89409777e-01, 6.53042072e-02, + 7.83234438e-01, 2.88398497e-01, 2.41418620e-01, 6.62504572e-01, + 2.46063185e-01, 6.65859118e-01, 5.17308517e-01, 4.24088988e-01, + 5.54687809e-01, 2.87051520e-01, 7.06574706e-01, 4.14856869e-01, + 3.60545560e-01, 8.28656915e-01, 9.24966912e-01, 4.60073109e-02, + 2.32626993e-01, 3.48519369e-01, 8.14966479e-01, 9.85491428e-01, + 9.68971705e-01, 9.04948346e-01, 2.96556265e-01, 9.92011243e-01, + 2.49420041e-01, 1.05906155e-01, 9.50952611e-01, 2.33420255e-01, + 6.89768265e-01, 5.83563590e-02, 7.30709099e-01, 8.81720212e-01, + 2.72436895e-01, 3.79056896e-01, 3.74296183e-01, 7.48788258e-01, + 2.37807243e-01, 1.71853099e-01, 4.49291649e-01, 3.04468407e-01, + 8.39189122e-01, 2.37741826e-01, 5.02389457e-01, 9.42583600e-01, + 6.33997698e-01, 8.67289405e-01, 9.40209689e-01, 7.50764862e-01, + 6.99575060e-01, 9.67965567e-01, 9.94400790e-01, 4.51821683e-01, + 7.08697782e-02, 2.92794031e-01, 1.52354706e-01, 4.17486375e-01, + 1.31289328e-01, 6.04117804e-01, 3.82808059e-01, 8.95385884e-01, + 9.67794672e-01, 5.46884902e-01, 2.74823570e-01, 5.92230419e-01, + 8.96761158e-01, 4.06733346e-01, 5.52078277e-01, 2.71652768e-01, + 4.55444149e-01, 4.01713535e-01, 2.48413465e-01, 5.05866384e-01, + 3.10380826e-01, 3.73034864e-01, 5.24970442e-01, 7.50595023e-01, + 3.33507466e-01, 9.24158767e-01, 8.62318547e-01, 4.86902960e-02, + 2.53642524e-01, 4.46135513e-01, 1.04627889e-01, 3.48475989e-01, + 7.40097526e-01, 6.80514481e-01, 6.22384429e-01, 7.10528403e-01, + 2.04923687e-01, 3.41698115e-01, 6.76242482e-01, 8.79234763e-01, + 5.43678054e-01, 2.82699651e-01, 3.02352580e-02, 7.10336829e-01, + 7.88410351e-03, 3.72679070e-01, 5.30537215e-01, 9.22111462e-01, + 8.94945450e-02, 4.05942322e-01, 2.43131997e-02, 3.42610984e-01, + 6.22231059e-01, 2.79067948e-01, 2.09749950e-01, 1.15703233e-01, + 5.77140244e-01, 6.95270006e-01, 6.71957141e-01, 9.48861021e-01, + 2.70321389e-03, 6.47196654e-01, 6.00392237e-01, 5.88739610e-01, + 9.62770320e-01, 1.68716734e-02, 6.96482431e-01, 8.13678650e-01, + 5.09807197e-01, 3.33964870e-01, 7.90840163e-01, 9.72429256e-02, + 4.42035638e-01, 5.19952375e-01, 6.93956411e-01, 9.08857320e-02, + 2.27759502e-01, 4.10301563e-01, 6.23294673e-01, 8.86960781e-01, + 6.18826168e-01, 1.33461471e-01, 9.80580133e-01, 8.71785735e-01, + 5.02720761e-01, 9.22347982e-01, 5.41380794e-01, 9.23306068e-01, + 8.29897369e-01, 9.68286410e-01, 9.19782811e-01, 3.60338174e-02, + 1.74772004e-01, 3.89134677e-01, 9.52142697e-01, 3.00028919e-01, + 1.60467644e-01, 8.86304666e-01, 4.46394415e-01, 9.07875594e-01, + 1.60230466e-01, 6.61117512e-01, 4.40263753e-01, 7.64867690e-02, + 6.96463145e-01, 2.47398756e-01, 3.96155226e-02, 5.99442982e-02, + 6.10785371e-02, 9.07732957e-01, 7.39883918e-01, 8.98062357e-01, + 6.72582311e-01, 5.28939929e-01, 3.04446364e-01, 9.97962251e-01, + 3.62189059e-01, 4.70648949e-01, 3.78245175e-01, 9.79526929e-01, + 1.74658385e-01, 3.27988001e-01, 6.80348666e-01, 6.32076183e-02, + 6.07249374e-01, 4.77646503e-01, 2.83999977e-01, 2.38413281e-01, + 5.14512743e-01, 3.67927581e-01, 4.56519891e-01, 3.37477382e-01, + 9.70493694e-01, 1.33439432e-01, 9.68039532e-02, 3.43391729e-01, + 5.91026901e-01, 6.59176472e-01, 3.97256747e-01, 9.99277994e-01, + 3.51892996e-01, 7.21406668e-01, 6.37582695e-01, 8.13053863e-01, + 9.76225663e-01, 8.89793656e-01, 7.64561974e-01, 6.98248478e-01, + 3.35498170e-01, 1.47685578e-01, 6.26360031e-02, 2.41901704e-01, + 4.32281481e-01, 5.21996274e-01, 7.73083554e-01, 9.58740923e-01, + 1.17320480e-01, 1.07004140e-01, 5.89694723e-01, 7.45398074e-01, + 8.48150380e-01, 9.35832080e-01, 9.83426242e-01, 3.99801692e-01, + 3.80335184e-01, 1.47808677e-01, 6.84934439e-01, 6.56761958e-01, + 8.62062596e-01, 9.72579948e-02, 4.97776908e-01, 5.81081930e-01, + 2.41557040e-01, 1.69025406e-01, 8.59580836e-01, 5.85349222e-02, + 4.70620904e-01, 1.15834001e-01, 4.57058761e-01, 9.79962326e-01, + 4.23706353e-01, 8.57124918e-01, 1.17315564e-01, 2.71252077e-01, + 4.03792741e-01, 3.99812140e-01, 6.71383479e-01, 3.44718127e-01, + 7.13766868e-01, 6.39186899e-01, 3.99161145e-01, 4.31760128e-01, + 6.14527700e-01, 7.00421901e-02, 8.22406738e-01, 6.53421161e-01, + 7.26342464e-01, 5.36923001e-01, 1.10477111e-01, 4.05035613e-01, + 4.05373583e-01, 3.21042990e-01, 2.99503249e-02, 7.37254243e-01, + 1.09784458e-01, 6.06308133e-01, 7.03217496e-01, 6.34786323e-01, + 9.59142252e-01, 1.03298155e-01, 8.67167159e-01, 2.91902348e-02, + 5.34916855e-01, 4.04243618e-01, 5.24183860e-01, 3.65099877e-01, + 1.90566915e-01, 1.91228974e-02, 5.18149814e-01, 8.42776863e-01, + 3.73215956e-01, 2.22863818e-01, 8.05320035e-02, 8.53109231e-02, + 2.21396446e-01, 1.00014061e-01, 2.65039698e-01, 6.61494621e-02, + 6.56048672e-02, 8.56276180e-01, 1.62120261e-01, 5.59682406e-01, + 7.73455544e-01, 4.56409565e-01, 1.53368878e-01, 1.99596142e-01, + 4.32984206e-01, 5.28234089e-01, 3.49440292e-01, 7.81479600e-01, + 7.51021649e-01, 9.27211807e-01, 2.89525490e-02, 8.95691291e-01, + 3.92568788e-01, 8.78372495e-01, 6.90784776e-01, 9.87348757e-01, + 7.59282452e-01, 3.64544626e-01, 5.01063173e-01, 3.76389155e-01, + 3.64911836e-01, 2.60904499e-01, 4.95970295e-01, 6.81739945e-01, + 2.77340271e-01, 5.24379811e-01, 1.17380294e-01, 1.59845287e-01, + 4.68063547e-02, 9.70731443e-01, 3.86035151e-03, 1.78579968e-01, + 6.12866753e-01, 8.13695989e-02, 8.81896503e-01, 7.19620158e-01, + 9.66389971e-01, 5.07635547e-01, 3.00403683e-01, 5.49500573e-01, + 9.30818717e-01, 5.20761437e-01, 2.67207032e-01, 8.77398789e-01, + 3.71918749e-01, 1.38335000e-03, 2.47685022e-01, 3.18233509e-01, + 8.58777468e-01, 4.58503167e-01, 4.44587288e-01, 3.36102266e-01, + 8.80678123e-01, 9.45026777e-01, 9.91890329e-01, 3.76741267e-01, + 9.66147446e-01, 7.91879570e-01, 6.75689148e-01, 2.44889479e-01, + 2.16457261e-01, 1.66047825e-01, 9.22756610e-01, 2.94076662e-01, + 4.53094245e-01, 4.93957834e-01, 7.78171595e-01, 8.44234962e-01, + 1.39072701e-01, 4.26904360e-01, 8.42854888e-01, 8.18033306e-01, + 1.02413758e-01, 1.56383349e-01, 3.04198692e-01, 7.53590691e-02, + 4.24663003e-01, 1.07617705e-01, 5.68217594e-01, 2.46556940e-01, + 5.96433065e-01, 1.17525643e-01, 9.75883868e-01, 9.32561204e-01, + 3.91796939e-01, 2.42178594e-01, 2.50398213e-01, 4.83393535e-01, + 3.99928019e-02, 6.39705106e-01, 4.08302908e-01, 3.77406573e-01, + 8.09364971e-01, 7.09035460e-01, 9.54333815e-01, 3.51936240e-01, + 8.97542765e-01, 7.69967186e-01, 3.57424652e-01, 6.21665436e-01, + 2.88569958e-01, 8.74399917e-01, 1.12427317e-01, 2.12434361e-01, + 1.83033292e-01, 4.03026002e-01, 7.45232960e-01, 5.26907449e-01, + 4.87676324e-01, 5.45964897e-04, 4.25401725e-01, 6.35537748e-02, + 2.08253252e-01, 9.32393939e-01, 2.15398204e-01, 8.58337639e-01, + 8.02893372e-01, 1.59146237e-01, 6.05711957e-01, 1.15661872e-01, + 7.27888158e-01, 6.37462277e-01, 8.11938562e-01, 4.79384549e-01, + 9.14863088e-01, 4.93489468e-02, 2.92888565e-01, 7.15052597e-01, + 4.18109212e-01, 1.72951354e-01, 1.07210745e-01, 8.17339111e-01, + 4.73142978e-01, 8.82283672e-01, 7.33289134e-01, 4.09726206e-01, + 3.73511014e-01, 5.15638347e-01, 8.89059953e-01, 7.37278580e-01, + 5.15296427e-03, 6.94157851e-01, 9.19507407e-01, 7.10455760e-01, + 1.77005782e-01, 4.83518127e-01, 1.40316018e-01, 3.58995278e-01, + 9.37117042e-01, 9.23305308e-01, 2.82836852e-01, 3.39631044e-01, + 6.00212868e-01, 9.63197295e-01, 1.47801334e-01, 2.56916644e-01, + 8.73556827e-01, 4.91892232e-01, 8.98961092e-01, 1.85517898e-01, + 5.32668587e-01, 3.26269633e-01, 3.16542560e-01, 4.46876964e-01, + 4.33077449e-01, 3.57346880e-01, 9.14970770e-01, 7.31744185e-01, + 7.27546991e-01, 2.89913450e-01, 5.77709424e-01, 7.79179433e-01, + 7.95590369e-01, 3.44530461e-01, 7.70872757e-01, 7.35893897e-01, + 1.41506486e-01, 8.65945469e-01, 4.41321470e-01, 4.86410449e-01, + 4.48369179e-01, 5.67846001e-01, 6.21169247e-01, 4.98179566e-01, + 8.66788543e-01, 6.27734756e-01, 4.01427949e-01, 4.16691757e-01, + 8.10838615e-01, 3.48191943e-01, 2.11454796e-01, 5.93831880e-02, + 8.76026848e-01, 9.18546451e-01, 1.20120182e-01, 3.34473741e-01, + 1.75372070e-01, 1.15898469e-01, 8.99866743e-01, 5.68772591e-02, + 9.80485663e-01, 9.64508607e-02, 8.63470649e-01, 5.66506107e-01, + 3.67917488e-01, 3.42342377e-01, 7.57364143e-01, 3.14573295e-01, + 6.57318917e-01, 5.17326084e-01, 4.84965645e-01, 9.01162171e-01, + 5.54645059e-01, 8.26861603e-01, 7.25573534e-01, 3.85572461e-02, + 7.73110053e-01, 2.16870250e-01, 9.03149647e-01, 4.29241906e-02, + 3.33072034e-01, 9.97329472e-02, 4.75589117e-01, 8.20022436e-01, + 2.98187360e-01, 1.50934897e-01, 3.30267036e-01, 8.13880142e-01, + 1.40383958e-01, 2.27362449e-01, 6.88519645e-02, 7.05710044e-01, + 3.95233244e-01, 3.10839977e-01, 7.18626390e-01, 3.35977542e-01, + 7.27771273e-01, 8.15199395e-01, 2.17662843e-01, 9.73818697e-01, + 1.62357948e-01, 2.90840907e-01, 1.79795291e-01, 3.45505656e-01, + 4.80060888e-01, 5.22175869e-01, 8.53606042e-01, 8.89447909e-01, + 2.20103861e-01, 6.22894032e-01, 1.11496057e-01, 4.58969860e-01, + 3.22333538e-01, 3.16500745e-01, 4.82584242e-01, 7.29827636e-01, + 6.91826588e-02, 8.79173338e-01, 7.34813775e-01, 1.76499389e-01, + 9.39160909e-01, 5.06312224e-01, 9.99808578e-01, 1.97259474e-01, + 5.34908198e-01, 2.90248043e-01, 3.04173557e-01, 5.91065381e-01, + 9.21719067e-01, 8.05263856e-01, 7.23941399e-01, 5.59173782e-01, + 9.22298504e-01, 4.92361407e-01, 8.73832178e-01, 8.33981644e-01, + 2.13835347e-01, 7.71225463e-01, 1.21711569e-02, 3.22829538e-01, + 2.29567445e-01, 5.06862958e-01, 7.36853162e-01, 9.76763674e-02, + 5.14922202e-01, 9.38412022e-01, 2.28646551e-01, 6.77141144e-01]; + // set up a line search let linesearch = MoreThuenteLineSearch::new().c(1e-4, 0.9)?; // Set up solver - let solver = LBFGS::new(linesearch, 7); + let solver = LBFGS::new(linesearch, 10); // Run solver let res = Executor::new(cost, solver, init_param) // .add_observer(ArgminSlogLogger::term(), ObserverMode::Always) - .max_iters(100) + .max_iters(300) .run()?; // Wait a second (lets the logger flush everything before printing again) ```

The output is,

ArgminResult:
    param (best):  [0.9998660430280772, 0.9997711162728186, 1.0000183343714812, 0.9999545581212057, 1.0001581312529746, ..., 1.0020650224246153, 1.0026319569449569, 1.005436729578769, 1.008673050260663, 1.017108153862398], shape=[1000], strides=[1], layout=C | F (0x3), const ndim=1
    cost (best):   764.7226773367395
    iters (best):  299
    iters (best):  299
    iters (total): 300
    termination: Maximum number of iterations reached
    time:        697.088308ms

[ perf record: Woken up 2 times to write data ]
[ perf record: Captured and wrote 1,652 MB perf.data (200 samples) ]
writing flamegraph to "flamegraph.svg"

and the resulting flamegraph can be found below, flamegraph svg

full svg here.

I'm not used to do much profiling of Rust application, but it does looks like it's spending a lot of the time in clone and in managing openblas threads. The latter is somewhat surprising as I have also tried setting OMP_NUM_THREADS=1 environmental variable with no effect. Profiling run on 12 core CPU.

rth commented 4 years ago

Though, maybe I'm doing something wrong with flamegraph. If I run the same example with cargo run it takes 1.70s which is a reasonable run time for profiling. However with cargo flamegraph the same 100 iterations of the solver are done in only 25.428ms according to the time printed to stdout. If true the above results are likely not meaningful (or something is wrong with the timer).

Edit: yes, the --release option was missing from cargo run, so I need a bigger example. I updated the figures above with an example that takes longer to run (700ms now), but generally we still might need a more realistic problem than Rosenbrock.

stefan-k commented 4 years ago

This is a lot slower than I expected.

code optimization: as far as I understood form @stefan-k comment above, no one has spent much too time optimizing LBFGS in argmin, so even a little benchmarking & tuning effort might yield reasonable results. In particular, one clear advantage over Fortran we have is the ability to use BLAS.

This is true, unfortunately performance optimization has so far not been one of the main concerns (even though this is the most fun part). When looking at the LBFGS implementation, I believe one of the main problems might be the clones. In particular, the operator is cloned every time the line search is executed (in OpWrapper::new_from_op(&op)). It should be possible to solve this without clones. This may then also remove the requirement on operators having to implement Clone.

I have to admit I'm not used to flamegraphs: Do I see this correctly that a lot of time is spent on computing the gradient?

nilgoyette commented 4 years ago

parameters: we need to make sure that we run benchmarks exactly with the same parameters (including line search) as the Fortran version for the results to be comparable.

Yes, this was one of the problem I had. My original code was

let mut lbfgsb = Lbfgsb::new(x0, f, None);
lbfgsb.max_iteration(100);
lbfgsb.set_matric_correction(10);
lbfgsb.set_termination_tolerance(1e-7);
lbfgsb.set_tolerance(1e-05);

I think that matric_correction is the second parameter in LBFGS::new and max_iteration is easy to find. However, I couldn't find termination_tolerance and tolerance in argmin. Also, I used MoreThuenteLineSearch with "random" parameters as I don't know what they are using in the original Fortran version.

dbischof90 commented 1 year ago

Mea culpa for bringing this back from the dead - I'd be interested whether any progress has been made here?

nilgoyette commented 1 year ago

None that I'm aware of. I seem to recall that one of the released version made the whole crate faster, but I can't find any proof of that!

In all case, even if someone works on that problem, I doubt that a general optimization crate can be faster than a specialized Fortran L-BFGS library.

stefan-k commented 1 year ago

I seem to recall that one of the released version made the whole crate faster, but I can't find any proof of that!

Probably this change: #112 . My initial design of observers was... well, stupid. Before this PR, quite a lot of time was spent on observers, even if no observer was used (some details are here: #111). I made further improvements to the observer design later on, but I think those had less of an impact performance-wise.

I doubt that a general optimization crate can be faster than a specialized Fortran L-BFGS library.

That's probably correct, but benchmarks could still be insightful.

I'd be interested whether any progress has been made here?

Unfortunately no. I don't think I'll be able to commit time to this in the near future. I'd appreciate it if someone with a bit of benchmarking experience could help out with this one.

jonboh commented 1 year ago

Hello @stefan-k, have you considered adding criterion.rs style benchmarks in order to track the crate's algorithm performance? I've quickly turned the backtracking example into one and run it on the commit prior to https://github.com/argmin-rs/argmin/pull/112 and the current main branch. This isn't strictly related to benchmarking L-BFGS in comparison to the Fortran library but could ease evaluating changes that aim to improve performance, and also would prevent degrading it in future releases. As you can see in the changes in my fork, the changes necessary are really minimal:

With those changes we can generate reports with a couple of commands

git checkout 6deb382_benchmark
cargo bench --bench backtracking -- --save-baseline 6deb382
git checkout main
cargo bench --bench backtracking -- --baseline 6deb382
xdg-open ./target/criterion/report/index.html

image image (blue corresponds to main and red to the commit prior to #112 )

Let me know your thoughts, if you think it would be a worthwhile addition I think I could make a PR this week to add these benchmarks. I think this would ease working on the performance of the LBFGS algorithm (and others). It would give a good target in which to run the profiles.

stefan-k commented 1 year ago

Hi @jonboh,

thanks a lot for this comment and the work you have put into it! I have heard of criterion.rs but haven't looked into it in detail. This looks amazing and I think this would be an extremely useful addition and a very good basis for future performance improvements. I would highly appreciate it if you could issue a PR :)