JuliaManifolds / Manopt.jl

πŸ”οΈManopt. jl – Optimization on Manifolds in Julia
http://manoptjl.org
Other
314 stars 40 forks source link

CMA-ES #373

Closed mateuszbaran closed 4 months ago

mateuszbaran commented 5 months ago

The Covariance Matrix Adaptation-Evolutionary Strategy algorithm adapted to the Riemannian setting.

TODO:

TODO beyond this PR:

codecov[bot] commented 5 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 99.74%. Comparing base (1ea2ac6) to head (8c39dca).

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #373 +/- ## ========================================== + Coverage 99.73% 99.74% +0.01% ========================================== Files 73 74 +1 Lines 6876 7167 +291 ========================================== + Hits 6858 7149 +291 Misses 18 18 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

kellertuer commented 5 months ago

Until now I just have one minor remark: For all algorithms I tried to find speaking names. CPPA is cyclic_proximal_point. ALM and EPM also enjoy their long names,... but on the other hand I do see that the 5 letter acronym here would really be a bit long if written as a function name. Still, this would be the first shortened one, right? Maybe we can think about a tradeoff?

mateuszbaran commented 5 months ago

Yes, I see. Unfortunately the five word name seems a bit too long and I don't have a better idea.

kellertuer commented 5 months ago

I totally understand this. That is also why I tried to phrase this as careful as possible. While for documentation I am fine with math variable letters, in code (unless it is function internal variables) I do prefer a bit more speaking variables, that depend a bit less on β€œthe user already knows”. But for this case I indeed do also not yet see how we could do something a bit longer than cma_es that helps starters a bit without confusing experts (that do understand what cma_es is).

mateuszbaran commented 5 months ago

I think those NoEffect stopping criteria aren't particularly important and we already have so much stopping criteria code here. @kellertuer , it would be nice to export all of them but I don't want to pollute the namespace too much. At least TolX and TolFun are things user may want to adjust but they would likely want to keep other criteria too. Stagnation and EqualFunValues are somewhat generic, that is they can in theory be applied to any evolutionary optimization algorithm like the particle swarm we already have here. At least if we adjusted the particle swarm to keep track of some population heuristics. Not something I'd like to do in this PR but could be an idea for the future.

kellertuer commented 5 months ago

At least TolX and TolFun

do you mean stopping criteria like StopWhenChangeLess and StopWhenFunctionDecreaseLess (second does not yet exist I think)? Sure those could have tolerances a user can also set via a keyword directly, if you feel that is reasonable. For 1-2 StoppingCriteria in the default I usually did not pass keywords to them, but if the have 4-5 it might be a good idea to accept keyword parameters to set stopping criteria tolerances.

mateuszbaran commented 5 months ago

Hm, could maybe some of the stopping criteria be combined into one? Also, currently they have a bit cryptic names, but I can check them out later when they have a bit more description. For example EqualFunValuesCondition without a reference to CMAES (that is out of context) sounds super random and may even be confused to fit other solvers? I tried to use names StopWhenX names for stopping criteria; then exporting all of them is also not that bad, and code stays relatively readable.

I used names from appendix B.3 of Hansen's paper. We could surely have better names but they are fairly complex criteria that don't have nice StopWhenX names. StopWhenSigmaTimesMaximumEigenvalueOfCovarianceMatrixGreaterThan?

do you mean stopping criteria like StopWhenChangeLess and StopWhenFunctionDecreaseLess (second does not yet exist I think)?

No, I mean those from Hansen's paper. TolX is specific to CMA-ES and TolFun is specific to evolutionary algorithms.

mateuszbaran commented 5 months ago

For 1-2 StoppingCriteria in the default I usually did not pass keywords to them, but if the have 4-5 it might be a good idea to accept keyword parameters to set stopping criteria tolerances.

Sure, that makes sense.

kellertuer commented 5 months ago

I used names from appendix B.3 of Hansen's paper. We could surely have better names but they are fairly complex criteria that don't have nice StopWhenX names. StopWhenSigmaTimesMaximumEigenvalueOfCovarianceMatrixGreaterThan?

No. That is too short, we would need something a bit longer I think ;)

We could maybe check for an interpreting name? What does this mean that sigma times the largest eigenvalue of the covariance matrix exceeds some value?

do you mean stopping criteria like StopWhenChangeLess and StopWhenFunctionDecreaseLess (second does not yet exist I think)?

No, I mean those from Hansen's paper. TolX is specific to CMA-ES and TolFun is specific to evolutionary algorithms.

Ah, ok. We then should check that paper for more details and (as above) for interpreting and not-too-long names. I was not able to work the last few days (fetched a cold, but getting better already), so I will first have to fetch up teaching preparations tomorrow, but besides that I can surely help looking for nice names (as you might have noticed, I do like to do that anyways).

kellertuer commented 5 months ago

Here is a few ideas for naming

The overview paper, though Euclidean is great, but we could also mentiond https://ieeexplore.ieee.org/document/5299260 Also, the bib entry can be improved to remove the DOI and change the URL to eprint/eprinttype, but I can do that somewhen later as well, but of course also Dreisigmeyer (already in the bib) https://optimization-online.org/wp-content/uploads/2017/03/5916.pdf – I would have to check the right reference still.

mateuszbaran commented 5 months ago
  • NoEffectAxis, NoEffectCoord – in the appendix he is a bit short about this, what m actually does, but since this method is basically a mean based one – we could instead of population even use names StopWhenMeanXfor these two? I would have to read the paper more thoroughly for a more detailed proposal here

I'd skip those two for this PR actually. It seems that they are just for terminating a bit earlier than other conditions would when the search stagnates.

  • EqualFunValues – for me this is a stronger statement then TolFun so basically super-concentration?

Yes, this is correct.

  • Stagnation maybe StopWhenCostStagnates?

I think we usually say that the algorithm stagnates, not the cost. Maybe StopWhenEvolutionStagnates?

We could maybe check for an interpreting name? What does this mean that sigma times the largest eigenvalue of the covariance matrix exceeds some value?

  • TolXUp when including a warning about possible too small sigma, this could also be called StopWhenPopulationDiverges?

That's a good idea.

  • TolFun This is siilar, but with a longer observation phase than a StopWhenCostDecreaseLess ? Or even StopWhenCostLess? Hansen speaks about range of function values, so one could say StopWhenPopulationCostConcentrated?

I like StopWhenPopulationCostConcentrated.

  • TolXI think this is basically StopWhenPopulationConcentrated?

TolX is a bit stronger because it also demands that the rank-1 update of covariance matrix is small, so it's more "population is concentrated and unlikely to get de-concentrated".

The overview paper, though Euclidean is great, but we could also mentiond https://ieeexplore.ieee.org/document/5299260 Also, the bib entry can be improved to remove the DOI and change the URL to eprint/eprinttype, but I can do that somewhen later as well, but of course also Dreisigmeyer (already in the bib) https://optimization-online.org/wp-content/uploads/2017/03/5916.pdf – I would have to check the right reference still.

Thanks for the link to Colutto's paper, I must have missed it. They seem to skip the parallel transport of covariance matrix and base their work on an older variant of the Euclidean CMA-ES but otherwise it's more or less the same thing. The Dreisigmeyer's paper doesn't mention CMA-ES but it's also about direct optimization on manifolds so maybe it could be mentioned somewhere.

kellertuer commented 5 months ago

I think we usually say that the algorithm stagnates, not the cost. Maybe StopWhenEvolutionStagnates?

That is also fine with me, I just thought personally, it could also be a very flat area, where the Evolution continues but the cost stagnates.

So both EqualFunValues and its weaker cousin TolX still need a good name that is stronger than Population concentrated, hm. I do not yet have a good name here, but I like the others so far.

For the Collate paper, I did not yet check that too closely, but it might still be fair to mention them. I havenβ€˜t had the time to check too much versus Dreisigmayer, but my student will work on his mesh version next semester then, though I missed the paper slightly in my last post I meant to mention https://optimization-online.org/wp-content/uploads/2007/08/1742.pdf –and LTMADS is a project for said student (but sure that is more grid based).

mateuszbaran commented 5 months ago

I've checked performance. For non-trivial examples it's bound by either objective calculation or eigendecomposition, so I've reworked the code a bit to make sure only one eigendecomposition per iteration is made. A standard trick in Euclidean CMA-ES is updating decomposition every few iterations rather than every single one but to make it work here we'd still need fast decomposition transport. It could be a fun follow-up project but for this PR I think it's fast enough.

mateuszbaran commented 5 months ago

@kellertuer the latest failure is due to convex bundle method, maybe something you'd like to take a look at?

β”Œ Warning: WolfePowellLinesearch
β”‚   caller = ip:0x0
β”” @ Core :-1
β”Œ Warning: The Lagrange multiplier is positive.
β”‚ At iteration #11 the negative of the Lagrange multiplier, -ΞΎ, became negative.
β”‚ 
β”‚ Consider increasing either the `diameter` keyword argument, or changing
β”‚ one of the parameters involved in the estimation of the sectional curvature, such as
β”‚ `k_max`, or `Ο±` in the `convex_bundle_method` call.
β”” @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
β”Œ Warning: The Lagrange multiplier is positive.
β”‚ At iteration #12 the negative of the Lagrange multiplier, -ΞΎ, became negative.
β”‚ 
β”‚ Consider increasing either the `diameter` keyword argument, or changing
β”‚ one of the parameters involved in the estimation of the sectional curvature, such as
β”‚ `k_max`, or `Ο±` in the `convex_bundle_method` call.
β”” @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
β”Œ Warning: The Lagrange multiplier is positive.
β”‚ At iteration #13 the negative of the Lagrange multiplier, -ΞΎ, became negative.
β”‚ 
β”‚ Consider increasing either the `diameter` keyword argument, or changing
β”‚ one of the parameters involved in the estimation of the sectional curvature, such as
β”‚ `k_max`, or `Ο±` in the `convex_bundle_method` call.
β”” @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
β”Œ Warning: The Lagrange multiplier is positive.
β”‚ At iteration #14 the negative of the Lagrange multiplier, -ΞΎ, became negative.
β”‚ 
β”‚ Consider increasing either the `diameter` keyword argument, or changing
β”‚ one of the parameters involved in the estimation of the sectional curvature, such as
β”‚ `k_max`, or `Ο±` in the `convex_bundle_method` call.
β”” @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
β”Œ Warning: The Lagrange multiplier is positive.
β”‚ At iteration #15 the negative of the Lagrange multiplier, -ΞΎ, became negative.
β”‚ 
β”‚ Consider increasing either the `diameter` keyword argument, or changing
β”‚ one of the parameters involved in the estimation of the sectional curvature, such as
β”‚ `k_max`, or `Ο±` in the `convex_bundle_method` call.
β”” @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
β”Œ Warning: The Lagrange multiplier is positive.
β”‚ At iteration #16 the negative of the Lagrange multiplier, -ΞΎ, became negative.
β”‚ 
β”‚ Consider increasing either the `diameter` keyword argument, or changing
β”‚ one of the parameters involved in the estimation of the sectional curvature, such as
β”‚ `k_max`, or `Ο±` in the `convex_bundle_method` call.
β”” @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
β”Œ Warning: The Lagrange multiplier is positive.
β”‚ At iteration #17 the negative of the Lagrange multiplier, -ΞΎ, became negative.
β”‚ 
β”‚ Consider increasing either the `diameter` keyword argument, or changing
β”‚ one of the parameters involved in the estimation of the sectional curvature, such as
β”‚ `k_max`, or `Ο±` in the `convex_bundle_method` call.
β”” @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
β”Œ Warning: The Lagrange multiplier is positive.
β”‚ At iteration #18 the negative of the Lagrange multiplier, -ΞΎ, became negative.
β”‚ 
β”‚ Consider increasing either the `diameter` keyword argument, or changing
β”‚ one of the parameters involved in the estimation of the sectional curvature, such as
β”‚ `k_max`, or `Ο±` in the `convex_bundle_method` call.
β”” @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
β”Œ Warning: The Lagrange multiplier is positive.
β”‚ At iteration #19 the negative of the Lagrange multiplier, -ΞΎ, became negative.
β”‚ 
β”‚ Consider increasing either the `diameter` keyword argument, or changing
β”‚ one of the parameters involved in the estimation of the sectional curvature, such as
β”‚ `k_max`, or `Ο±` in the `convex_bundle_method` call.
β”” @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
β”Œ Warning: The Lagrange multiplier is positive.
β”‚ At iteration #20 the negative of the Lagrange multiplier, -ΞΎ, became negative.
β”‚ 
β”‚ Consider increasing either the `diameter` keyword argument, or changing
β”‚ one of the parameters involved in the estimation of the sectional curvature, such as
β”‚ `k_max`, or `Ο±` in the `convex_bundle_method` call.
β”” @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
A simple median run: Test Failed at /Users/runner/work/Manopt.jl/Manopt.jl/test/solvers/test_convex_bundle_method.jl:178
  Expression: distance(M, q2, m) < 0.01
   Evaluated: 0.01322676092[446](https://github.com/JuliaManifolds/Manopt.jl/actions/runs/8511560037/job/23311403520?pr=373#step:6:449)8948 < 0.01

Stacktrace:
 [1] macro expansion
   @ ~/hostedtoolcache/julia/1.9.4/x64/share/julia/stdlib/v1.9/Test/src/Test.jl:478 [inlined]
 [2] macro expansion
   @ ~/work/Manopt.jl/Manopt.jl/test/solvers/test_convex_bundle_method.jl:178 [inlined]
 [3] macro expansion
   @ ~/hostedtoolcache/julia/1.9.4/x64/share/julia/stdlib/v1.9/Test/src/Test.jl:1498 [inlined]
 [4] macro expansion
   @ ~/work/Manopt.jl/Manopt.jl/test/solvers/test_convex_bundle_method.jl:150 [inlined]
 [5] macro expansion
   @ ~/hostedtoolcache/julia/1.9.4/x64/share/julia/stdlib/v1.9/Test/src/Test.jl:1498 [inlined]
 [6] top-level scope
   @ ~/work/Manopt.jl/Manopt.jl/test/solvers/test_convex_bundle_method.jl:6
WARNING: using JuMP.VectorConstraint in module Main conflicts with an existing identifier.
kellertuer commented 5 months ago

Increase the tolerance, we are reworking that algorithm a bit currently anyways, since those warnings and errors appeared more than we thought.

mateuszbaran commented 5 months ago

Increase the tolerance, we are reworking that algorithm a bit currently anyways, since those warnings and errors appeared more than we thought.

Sure, that's good to know.

mateuszbaran commented 5 months ago

CMA-ES seems to handle this problem: https://github.com/JuliaManifolds/ManoptExamples.jl/issues/13 fairly well, and the performance looks competitive compared to Evolutionary.jl so I'd say this can be reviewed now.

mateuszbaran commented 4 months ago

I think I've addressed all your points.

kellertuer commented 4 months ago

I would still prefer a nicer constructor for the state with (a) the manifold as first argument to fill defaults properly) and that (b) initialises all internal / coipy things. The idea is that cma_es can call this one and is easier to read in code itself then.

The current constructor does have M first, but retraction and vector transports for ecxample can be set to nice default (and become kwargs) so can the basis and rng, maybe even stop.

The rest looks fine so far, I think.

mateuszbaran commented 4 months ago

I've added defaults for some arguments but many of them could potentially be changed if someone wants to experiment. There is quite a lot of logic in cma_es! because it reflects the Hansen's paper, and providing them as defaults in the state constructor would be a bit messy.

kellertuer commented 4 months ago

Yes I saw that part of the logic. You know changed at least the manifold part to have nice defaults, which is in line with the other solvers. That's nice.

mateuszbaran commented 4 months ago

A tutorial could definitely be useful for the more involved solvers, this one is actually fairly straightforward to use.

I think we can wait with registering a new Manopt version until #376 is merged.

kellertuer commented 4 months ago

Still, a small tutorial for most solvers might be nice – maybe also just one tutorial for all β€œderivative-free” ones, since their calls are really similar.

And sure with registration let's wait for that other PR.