Closed mateuszbaran closed 4 months ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 99.74%. Comparing base (
1ea2ac6
) to head (8c39dca
).
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Until now I just have one minor remark: For all algorithms I tried to find speaking names. CPPA is cyclic_proximal_point. ALM and EPM also enjoy their long names,... but on the other hand I do see that the 5 letter acronym here would really be a bit long if written as a function name. Still, this would be the first shortened one, right? Maybe we can think about a tradeoff?
Yes, I see. Unfortunately the five word name seems a bit too long and I don't have a better idea.
I totally understand this. That is also why I tried to phrase this as careful as possible.
While for documentation I am fine with math variable letters, in code (unless it is function internal variables) I do prefer a bit more speaking variables, that depend a bit less on βthe user already knowsβ. But for this case I indeed do also not yet see how we could do something a bit longer than cma_es
that helps starters a bit without confusing experts (that do understand what cma_es is).
I think those NoEffect
stopping criteria aren't particularly important and we already have so much stopping criteria code here. @kellertuer , it would be nice to export all of them but I don't want to pollute the namespace too much. At least TolX
and TolFun
are things user may want to adjust but they would likely want to keep other criteria too. Stagnation
and EqualFunValues
are somewhat generic, that is they can in theory be applied to any evolutionary optimization algorithm like the particle swarm we already have here. At least if we adjusted the particle swarm to keep track of some population heuristics. Not something I'd like to do in this PR but could be an idea for the future.
At least
TolX
andTolFun
do you mean stopping criteria like StopWhenChangeLess
and StopWhenFunctionDecreaseLess
(second does not yet exist I think)? Sure those could have tolerances a user can also set via a keyword directly, if you feel that is reasonable. For 1-2 StoppingCriteria in the default I usually did not pass keywords to them, but if the have 4-5 it might be a good idea to accept keyword parameters to set stopping criteria tolerances.
Hm, could maybe some of the stopping criteria be combined into one? Also, currently they have a bit cryptic names, but I can check them out later when they have a bit more description. For example
EqualFunValuesCondition
without a reference to CMAES (that is out of context) sounds super random and may even be confused to fit other solvers? I tried to use namesStopWhenX
names for stopping criteria; then exporting all of them is also not that bad, and code stays relatively readable.
I used names from appendix B.3 of Hansen's paper. We could surely have better names but they are fairly complex criteria that don't have nice StopWhenX
names. StopWhenSigmaTimesMaximumEigenvalueOfCovarianceMatrixGreaterThan
?
do you mean stopping criteria like
StopWhenChangeLess
andStopWhenFunctionDecreaseLess
(second does not yet exist I think)?
No, I mean those from Hansen's paper. TolX
is specific to CMA-ES and TolFun
is specific to evolutionary algorithms.
For 1-2 StoppingCriteria in the default I usually did not pass keywords to them, but if the have 4-5 it might be a good idea to accept keyword parameters to set stopping criteria tolerances.
Sure, that makes sense.
I used names from appendix B.3 of Hansen's paper. We could surely have better names but they are fairly complex criteria that don't have nice
StopWhenX
names.StopWhenSigmaTimesMaximumEigenvalueOfCovarianceMatrixGreaterThan
?
No. That is too short, we would need something a bit longer I think ;)
We could maybe check for an interpreting name? What does this mean that sigma times the largest eigenvalue of the covariance matrix exceeds some value?
do you mean stopping criteria like
StopWhenChangeLess
andStopWhenFunctionDecreaseLess
(second does not yet exist I think)?No, I mean those from Hansen's paper.
TolX
is specific to CMA-ES andTolFun
is specific to evolutionary algorithms.
Ah, ok. We then should check that paper for more details and (as above) for interpreting and not-too-long names. I was not able to work the last few days (fetched a cold, but getting better already), so I will first have to fetch up teaching preparations tomorrow, but besides that I can surely help looking for nice names (as you might have noticed, I do like to do that anyways).
Here is a few ideas for naming
NoEffectAxis
, NoEffectCoord
β in the appendix he is a bit short about this, what m actually does, but since this method is basically a mean based one β we could instead of population even use names StopWhenMeanX
for these two? I would have to read the paper more thoroughly for a more detailed proposal hereEqualFunValues
β for me this is a stronger statement then TolFun
so basically super-concentration?Stagnation
maybe StopWhenCostStagnates
?TolXUp
when including a warning about possible too small sigma, this could also be called StopWhenPopulationDiverges
?TolFun
This is siilar, but with a longer observation phase than a StopWhenCostDecreaseLess
? Or even StopWhenCostLess
? Hansen speaks about range of function values, so one could say StopWhenPopulationCostConcentrated
?TolX
I think this is basically StopWhenPopulationConcentrated
?The overview paper, though Euclidean is great, but we could also mentiond https://ieeexplore.ieee.org/document/5299260 Also, the bib entry can be improved to remove the DOI and change the URL to eprint/eprinttype, but I can do that somewhen later as well, but of course also Dreisigmeyer (already in the bib) https://optimization-online.org/wp-content/uploads/2017/03/5916.pdf βΒ I would have to check the right reference still.
NoEffectAxis
,NoEffectCoord
β in the appendix he is a bit short about this, what m actually does, but since this method is basically a mean based one β we could instead of population even use namesStopWhenMeanX
for these two? I would have to read the paper more thoroughly for a more detailed proposal here
I'd skip those two for this PR actually. It seems that they are just for terminating a bit earlier than other conditions would when the search stagnates.
EqualFunValues
β for me this is a stronger statement thenTolFun
so basically super-concentration?
Yes, this is correct.
Stagnation
maybeStopWhenCostStagnates
?
I think we usually say that the algorithm stagnates, not the cost. Maybe StopWhenEvolutionStagnates
?
We could maybe check for an interpreting name? What does this mean that sigma times the largest eigenvalue of the covariance matrix exceeds some value?
TolXUp
when including a warning about possible too small sigma, this could also be calledStopWhenPopulationDiverges
?
That's a good idea.
TolFun
This is siilar, but with a longer observation phase than aStopWhenCostDecreaseLess
? Or evenStopWhenCostLess
? Hansen speaks about range of function values, so one could sayStopWhenPopulationCostConcentrated
?
I like StopWhenPopulationCostConcentrated
.
TolX
I think this is basicallyStopWhenPopulationConcentrated
?
TolX
is a bit stronger because it also demands that the rank-1 update of covariance matrix is small, so it's more "population is concentrated and unlikely to get de-concentrated".
The overview paper, though Euclidean is great, but we could also mentiond https://ieeexplore.ieee.org/document/5299260 Also, the bib entry can be improved to remove the DOI and change the URL to eprint/eprinttype, but I can do that somewhen later as well, but of course also Dreisigmeyer (already in the bib) https://optimization-online.org/wp-content/uploads/2017/03/5916.pdf βΒ I would have to check the right reference still.
Thanks for the link to Colutto's paper, I must have missed it. They seem to skip the parallel transport of covariance matrix and base their work on an older variant of the Euclidean CMA-ES but otherwise it's more or less the same thing. The Dreisigmeyer's paper doesn't mention CMA-ES but it's also about direct optimization on manifolds so maybe it could be mentioned somewhere.
I think we usually say that the algorithm stagnates, not the cost. Maybe
StopWhenEvolutionStagnates
?
That is also fine with me, I just thought personally, it could also be a very flat area, where the Evolution continues but the cost stagnates.
So both EqualFunValues
and its weaker cousin TolX
still need a good name that is stronger than Population concentrated, hm. I do not yet have a good name here, but I like the others so far.
For the Collate paper, I did not yet check that too closely, but it might still be fair to mention them. I havenβt had the time to check too much versus Dreisigmayer, but my student will work on his mesh version next semester then, though I missed the paper slightly in my last post I meant to mention https://optimization-online.org/wp-content/uploads/2007/08/1742.pdf βand LTMADS is a project for said student (but sure that is more grid based).
I've checked performance. For non-trivial examples it's bound by either objective calculation or eigendecomposition, so I've reworked the code a bit to make sure only one eigendecomposition per iteration is made. A standard trick in Euclidean CMA-ES is updating decomposition every few iterations rather than every single one but to make it work here we'd still need fast decomposition transport. It could be a fun follow-up project but for this PR I think it's fast enough.
@kellertuer the latest failure is due to convex bundle method, maybe something you'd like to take a look at?
β Warning: WolfePowellLinesearch
β caller = ip:0x0
β @ Core :-1
β Warning: The Lagrange multiplier is positive.
β At iteration #11 the negative of the Lagrange multiplier, -ΞΎ, became negative.
β
β Consider increasing either the `diameter` keyword argument, or changing
β one of the parameters involved in the estimation of the sectional curvature, such as
β `k_max`, or `Ο±` in the `convex_bundle_method` call.
β @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
β Warning: The Lagrange multiplier is positive.
β At iteration #12 the negative of the Lagrange multiplier, -ΞΎ, became negative.
β
β Consider increasing either the `diameter` keyword argument, or changing
β one of the parameters involved in the estimation of the sectional curvature, such as
β `k_max`, or `Ο±` in the `convex_bundle_method` call.
β @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
β Warning: The Lagrange multiplier is positive.
β At iteration #13 the negative of the Lagrange multiplier, -ΞΎ, became negative.
β
β Consider increasing either the `diameter` keyword argument, or changing
β one of the parameters involved in the estimation of the sectional curvature, such as
β `k_max`, or `Ο±` in the `convex_bundle_method` call.
β @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
β Warning: The Lagrange multiplier is positive.
β At iteration #14 the negative of the Lagrange multiplier, -ΞΎ, became negative.
β
β Consider increasing either the `diameter` keyword argument, or changing
β one of the parameters involved in the estimation of the sectional curvature, such as
β `k_max`, or `Ο±` in the `convex_bundle_method` call.
β @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
β Warning: The Lagrange multiplier is positive.
β At iteration #15 the negative of the Lagrange multiplier, -ΞΎ, became negative.
β
β Consider increasing either the `diameter` keyword argument, or changing
β one of the parameters involved in the estimation of the sectional curvature, such as
β `k_max`, or `Ο±` in the `convex_bundle_method` call.
β @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
β Warning: The Lagrange multiplier is positive.
β At iteration #16 the negative of the Lagrange multiplier, -ΞΎ, became negative.
β
β Consider increasing either the `diameter` keyword argument, or changing
β one of the parameters involved in the estimation of the sectional curvature, such as
β `k_max`, or `Ο±` in the `convex_bundle_method` call.
β @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
β Warning: The Lagrange multiplier is positive.
β At iteration #17 the negative of the Lagrange multiplier, -ΞΎ, became negative.
β
β Consider increasing either the `diameter` keyword argument, or changing
β one of the parameters involved in the estimation of the sectional curvature, such as
β `k_max`, or `Ο±` in the `convex_bundle_method` call.
β @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
β Warning: The Lagrange multiplier is positive.
β At iteration #18 the negative of the Lagrange multiplier, -ΞΎ, became negative.
β
β Consider increasing either the `diameter` keyword argument, or changing
β one of the parameters involved in the estimation of the sectional curvature, such as
β `k_max`, or `Ο±` in the `convex_bundle_method` call.
β @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
β Warning: The Lagrange multiplier is positive.
β At iteration #19 the negative of the Lagrange multiplier, -ΞΎ, became negative.
β
β Consider increasing either the `diameter` keyword argument, or changing
β one of the parameters involved in the estimation of the sectional curvature, such as
β `k_max`, or `Ο±` in the `convex_bundle_method` call.
β @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
β Warning: The Lagrange multiplier is positive.
β At iteration #20 the negative of the Lagrange multiplier, -ΞΎ, became negative.
β
β Consider increasing either the `diameter` keyword argument, or changing
β one of the parameters involved in the estimation of the sectional curvature, such as
β `k_max`, or `Ο±` in the `convex_bundle_method` call.
β @ Manopt ~/work/Manopt.jl/Manopt.jl/src/solvers/convex_bundle_method.jl:587
A simple median run: Test Failed at /Users/runner/work/Manopt.jl/Manopt.jl/test/solvers/test_convex_bundle_method.jl:178
Expression: distance(M, q2, m) < 0.01
Evaluated: 0.01322676092[446](https://github.com/JuliaManifolds/Manopt.jl/actions/runs/8511560037/job/23311403520?pr=373#step:6:449)8948 < 0.01
Stacktrace:
[1] macro expansion
@ ~/hostedtoolcache/julia/1.9.4/x64/share/julia/stdlib/v1.9/Test/src/Test.jl:478 [inlined]
[2] macro expansion
@ ~/work/Manopt.jl/Manopt.jl/test/solvers/test_convex_bundle_method.jl:178 [inlined]
[3] macro expansion
@ ~/hostedtoolcache/julia/1.9.4/x64/share/julia/stdlib/v1.9/Test/src/Test.jl:1498 [inlined]
[4] macro expansion
@ ~/work/Manopt.jl/Manopt.jl/test/solvers/test_convex_bundle_method.jl:150 [inlined]
[5] macro expansion
@ ~/hostedtoolcache/julia/1.9.4/x64/share/julia/stdlib/v1.9/Test/src/Test.jl:1498 [inlined]
[6] top-level scope
@ ~/work/Manopt.jl/Manopt.jl/test/solvers/test_convex_bundle_method.jl:6
WARNING: using JuMP.VectorConstraint in module Main conflicts with an existing identifier.
Increase the tolerance, we are reworking that algorithm a bit currently anyways, since those warnings and errors appeared more than we thought.
Increase the tolerance, we are reworking that algorithm a bit currently anyways, since those warnings and errors appeared more than we thought.
Sure, that's good to know.
CMA-ES seems to handle this problem: https://github.com/JuliaManifolds/ManoptExamples.jl/issues/13 fairly well, and the performance looks competitive compared to Evolutionary.jl so I'd say this can be reviewed now.
I think I've addressed all your points.
I would still prefer a nicer constructor for the state with (a) the manifold as first argument to fill defaults properly) and that (b) initialises all internal / coipy things. The idea is that cma_es
can call this one and is easier to read in code itself then.
The current constructor does have M first, but retraction and vector transports for ecxample can be set to nice default (and become kwargs) so can the basis and rng, maybe even stop.
The rest looks fine so far, I think.
I've added defaults for some arguments but many of them could potentially be changed if someone wants to experiment. There is quite a lot of logic in cma_es!
because it reflects the Hansen's paper, and providing them as defaults in the state constructor would be a bit messy.
Yes I saw that part of the logic. You know changed at least the manifold part to have nice defaults, which is in line with the other solvers. That's nice.
A tutorial could definitely be useful for the more involved solvers, this one is actually fairly straightforward to use.
I think we can wait with registering a new Manopt version until #376 is merged.
Still, a small tutorial for most solvers might be nice β maybe also just one tutorial for all βderivative-freeβ ones, since their calls are really similar.
And sure with registration let's wait for that other PR.
The Covariance Matrix Adaptation-Evolutionary Strategy algorithm adapted to the Riemannian setting.
TODO:
NoEffectAxis
,NoEffectCoord
,EqualFunValues
,Stagnation
,TolXUp
,TolFun
,TolX
).TODO beyond this PR:
spd_matrix_transport_to
is useful more generally.