Closed kellertuer closed 2 months ago
This is conceptually a quite interesting PR. I just spent a bit of time sketching the new idea of e vectorial objective. I think it would reduce code duplication (between inequality constraint access, equality constraint access and all Levenberg Marquardt) quite a bit.
Since I am not 100% familiar with LM; a bit of a feedback would be nice on the ideas sketched here @mateuszbaran: https://manoptjl.org/previews/PR386/plans/objective/#Manopt.VectorialGradientObjective This is of course only sketched and for example in the rendering I messed up the 1./2./3. in the 3 types of representations for the single-cost-function implementation. But I hope those 3 ( 1., 1., and 1. ;)) capture the existing ones and the new one the issue refers to?
If this sounds good, I could start implementing all that. Note that the basis in LM is now even stored in the type, so it is only there if the type requires it (the CoefficientType)
This is a nice direction of improvement. I don't quite get the difference between ComponentVectorialType
and PowerManifoldVectorialType
-- the second type essentially covers the first one as a special case?
Also, you should be careful to note that VectorialGradientObjective
doesn't represent multi-objective optimization but optimization of $g(f(p))$ for some $g\colon \mathbb{R}^n \to \mathbb{R}$, where we might want to encode g
in VectorialGradientObjective
. For example it is sum of squares for LM, mean/sum for stochastic gradient descent or some other function for robustified nonlinear least squares (see #332). I could also represent multi-objective optimization but we don't currently have any such solvers in Manopt.
Note that the basis in LM is now even stored in the type, so it is only there if the type requires it (the CoefficientType)
That sounds like a good idea.
Component is the old one, so basically nested vector, I kept that because I think the power manifold one might (more often) require and actual power manifold while the old vector variant did not.
And sure, one idea would be to have this also inside a VectorOptimisation problem later, but maybe we should then rethink the name, if you feel that might be confusing.
I would maybe thing that a MultiObjective
would combine the idea here with the the function g, that is store both internally.
Component is the old one, so basically nested vector, I kept that because I think the power manifold one might (more often) require and actual power manifold while the old vector variant did not.
I'm not sure if keeping that separation is actually useful. NestedPowerRepresentation
is the same thing. We could have some specializations to avoid explicit construction of power manifold if needed.
I would maybe thing that a
MultiObjective
would combine the idea here with the the function g, that is store both internally.
Multi-objective optimization doesn't have the function g
, its the single-objective optimization that needs it. VectorialGradientObjective
doesn't currently specify a unique optimization problem due to g
being unknown.
I never want to specify said g in this PR since the goal is really to only represent elements that map into Rn, like the equality, or inequality constraints or the vectorial function in LM. At least the first two never have said g.
But wrapping this in a new Objective that provides g would be the way to go I think.
Ok, doing just the Powermanifold thing should be fine as well and we can omit the component one.
I never want to specify said g in this PR since the goal is really to only represent elements that map into Rn, like the equality, or inequality constraints or the vectorial function in LM. At least the first two never have said g.
But wrapping this in a new Objective that provides g would be the way to go I think.
That's fine but then maybe let's use a name without objective
in it?
Interesting idea. For a bit of background: This started, when a student of mine said, it might be nice to have a Hessian with the cost in the constrained objective.
So I encapsuled that and instead of saving f
and ? grad_f` in the objective, it now (in this PR) stores an objective.
So the constrained objective is now and objective plus constraints.
So I would be fine with the idea that the constraints g and h are objectives (though vectorial) themselves. I also do not have enough experience in vectorial optimization whether they sometimes would really just have a vectorial cost? maybe the g you used to get a number is something that is just parametrised in an objective and not a concrete function? Then VectorialObjective would indeed be fine and describing well what we have – a function (and derivative information) that maps into a vector space.
Wel, but I am also fine giving it another name, just that I struggle a bit with a good name for now. Do you have ideas for a name?
Maybe just VectorialGradientFunction
and VectorialFunction
?
I also do not have enough experience in vectorial optimization whether they sometimes would really just have a vectorial cost?
Yes, that's what multi-objective optimization deals with. The goal is to explore the Pareto front. It is, in a way, equivalent to exploration of the impact of g
on the result of single-objective optimization of the composite function $g(f(p))$.
maybe the g you used to get a number is something that is just parametrised in an objective and not a concrete function?
I don't understand, why wouldn't it be a concrete function?
I don't understand, why wouldn't it be a concrete function?
Maybe some vector optimisation area I do not know? I do not know much.
But then the vectorial objective we have here is fine for vector optim as well just that a vecetorobjetive need a vectorial objective plus g Like the constraint objective needs an objective, one or two vectorial objectives.
So I till neither see what would be wrong with the vectorial objective nor do I have any other good name here.
Maybe some vector optimisation area I do not know? I do not know much.
Yes, it is fine for multi-objective/vector-valued optimization but to me using it directly for anything else is confusing. VectorialGradientObjective
sounds like something I'd only (or primarily) be using for multi-objective optimization. For single-valued optimization, VectorialGradientObjective
is not a complete objective.
But then the vectorial objective we have here is fine for vector optim as well just that a vecetorobjetive need a vectorial objective plus g
Using both names (vector objective and vectorial objective) for different things sounds confusing. Maybe one of this things could be names SplitObjective
for example?
Though not yet used, once we go for vector optimisation I want to keep VectorObjective
for that I feel.
Since we already discussed this is in most vases (even for LM) just a part of the objective, we could call the type here VectorFunction
? The only thing I do not like in this name is that it actually all contains the vector functions gradient ;)
edit: SplitObjective sounds too vague for me.
OK, then what about VectorGradientFunction
? It would be fine I think.
Sounds good. Will work on that tomorrow then. Thanks for the feedback and the discussions :)
I did the renaming and will now start to write the access functions (which will simplify the 3 existing access functions it will replace quite a bit),
I noticed that I am now not sure whether VectorGradientFunction
should be <:AbstractManifoldObjective
or not. It behaves in many aspects like such a type, but usually requires and argument more (to access the entries) or returns a vector of things instead of just a thing. So maybe it should not be an objective in type even?
A final thing to maybe consider: For the power manifold approach one sometimes needs the power manifold to access the elements of the (power manifolds) tangent vector.
For now I just added the power representation type to the new PowerManifoldVectorialType. That would, however, mean, one would often generate the power manifold just to access a component. I do not have a better idea; storing the power manifold in the objective would ne agains the idea of splitting the objective (or here part of the objective) and the manifold.
I noticed that I am now not sure whether
VectorGradientFunction
should be<:AbstractManifoldObjective
or not. It behaves in many aspects like such a type, but usually requires and argument more (to access the entries) or returns a vector of things instead of just a thing. So maybe it should not be an objective in type even?
It could be an objective for a multi-objective optimization problem but I don't think we want to design that feature in this PR so maybe let's not make it <:AbstractManifoldObjective
for now.
For now I just added the power representation type to the new PowerManifoldVectorialType. That would, however, mean, one would often generate the power manifold just to access a component. I do not have a better idea; storing the power manifold in the objective would ne agains the idea of splitting the objective (or here part of the objective) and the manifold.
I will think about it.
I agree on the first.
For the second I have 2 ideas
PowerManifold(M,vgf.dimension)
, this keeps the distinction but always generates a power manifoldI am more tending to the first case, since I think the memory/time spend on creating that is not too bad – and the second would break quite a bit with the current model ideas.
I illustrated my problem a bit with the get_gradients
and get_gradients!
functions.
I think the single gradient access functions are a bit easier, the Jacobian function might again have the need to create the power manifold (to access vector elements in order to get them to coordinates).
I see, I will try to fix it today or tomorrow.
I think I just found out what my main confusion was.
First of al: Sure if you see that wrapping the vector of functions makes it more type stable, then let‘s do that.
The main problem I had and why I did not get it to work is, that we now basically store the range of our functions, especially grad_g in our type.
But there is also the type the user might expect / want, and that‘s what confused me in get_gradients
, that I could not specify which type it returns.
A solution is for sure, is, that we have to carefully revise some of the code to be more agnostic to which representation on the power manifold is used. But that also means that the power manifold has to be available somewhere, to be agnostic here (that is to call X[N,1]
on the get_gradients
where N
is the power manifold. Then the question is where/whether to store that.
I would prefer to not store it in the objective, since until now the objective is meant to be independent of the manifold (though defined using it). That would mean we have to
So in short (of my breakfast thoughts). Since now we have different power manifolds appearing: Where to store that without breaking the current model of Manopt.jl? that is, I do not want to store it in the objective.
I think having a few places with N=PowerManifold(...)
when needed would be fine.
I have a solution. I can sketch it in short in the following but will provide more details when implementing and also documenting it in the following days.
Origin of the idea: The domain (M
) is stored in the problem. The range is not stored but implicitly assumed.
So: Store the range for the gradients in the problem as well (a new type of problem), use the same assumption as before for the DefaultProblem
and make the range of the gradient a(n optional) positional argument of the corresponding functions.
Sure one can hence provide a “wrong range”, but one can do the same with the manifold for an objective as well.
I think I like this new idea of a more precise problem (and nice fallbacks for the DefaultProblem to the old forms).
But overall I like the idea I had today and will continue to rework the code to that.
Hi! Sorry for a delay, I will try to find some time tomorrow or the day after to work on it.
I think I have a solution I can work through. I would just need some feedback whether that approach is useful and sounds good.
Here is more or less the interface I'd imagine for ALM:
function (
LG::AugmentedLagrangianGrad{
<:ConstrainedManifoldObjective{InplaceEvaluation,<:VectorConstraint}
}
)(
M::AbstractManifold, X, p
)
m = length(LG.co.g)
n = length(LG.co.h)
get_gradient!(M, X, LG.co, p)
MPm = PowerManifold(M, n)
YPm = zero_vector(MPm, p)
gps = get_inequality_constraint(M, LG.co, p, :)
needed_indices = gps .+ LG.μ ./ LG.ρ .> 0
get_grad_inequality_constraint!(MPm, YPm, LG.co, p, needed_indices)
for i in 1:m
# evaluate in place
if needed_indices[i]
X .+= (gps[i] * LG.ρ + LG.μ[i]) .* YPm[MPm, i]
end
end
hps = get_equality_constraint(M, LG.co, p, :)
MPn = PowerManifold(M, n)
YPn = zero_vector(MPn, p)
get_grad_equality_constraint!(MPn, YPn, LG.co, p, :)
X .+= (hpj .* LG.ρ .+ LG.λ) .* Y
for j in 1:n
# evaluate in place
X .+= (hps[j] * LG.ρ + LG.λ[j]) * YPn[M, j]
end
return X
end
Note that YPm
, YPn
, gps,
and hps
would be stored in AugmentedLagrangianGrad
to avoid unnecessary allocations, and ConstrainedManifoldObjective
would have to store the array representation type.
Currently the test example for ALM in tests is a weird corner case where the gradient is constant and moreover it appears to be the Euclidean gradient instead of Riemannian one?
Determining how to most efficiently evaluate a bunch of gradients would be deferred to get_grad_inequality_constraint!(MPm, YPm, LG.co, p, needed_indices)
which gets info what is needed through needed_indices
. The user would then write something like
function my_grad_inequality_constraint!(MPm, YPm, p, needed_indiced)
if 1 in needed_indices
YPm[MPm, 1] = some_value
end
if 2 in needed_indices
YPm[MPm, 2] = some_other_value
end
end
Note that if those gradients are very cheap to compute (like in the case of nonnegative PCA) it may be even slower to evaluate them selectively instead of all of them due to branch prediction issues, CPU cache architecture and vector instructions.
Note that YPm, YPn, gps, and hps would be stored in AugmentedLagrangianGrad to avoid unnecessary allocations, and ConstrainedManifoldObjective would have to store the array representation type.
But then you could never change that representation and you implicitly assume that get_equality_constaint( [...], :)
(which currently has its own name with an s at the end) always returns array power manifold tangent vectors.
That would (a) be breaking and (b) restrict usage to only exactly one representation where points can be represented in a single array (Fixed rank would for example be excluded).
I agree that for both cases (1) a single function for all gradients and (b) a function for every gradient of a component, there are surely cases where either of them is (far) more efficient than the other. That is also why I want to support both. But I also want to support the nested case further and nor remove that.
I do like the :
idea, that could deprecate the constraints
function.
Ah and most the hustle I went through in the last rewrite (and all thinking last week) was to avoid having to regenerate the Power manifold on every function call, hence there is now the range=
parameters, that by themselves do generate them when you do not pass an existing one. They also provide the exact difference that both nested and array power manifolds (or their tangent spaces to be precise) are possible.
But then you could never change that representation and you implicitly assume that
get_equality_constaint( [...], :)
(which currently has its own name with an s at the end) always returns array power manifold tangent vectors. That would (a) be breaking and (b) restrict usage to only exactly one representation where points can be represented in a single array (Fixed rank would for example be excluded).
No, it doesn't have to be array power representation:
function (
LG::AugmentedLagrangianGrad{
<:ConstrainedManifoldObjective{InplaceEvaluation,<:VectorConstraint}
}
)(
M::AbstractManifold, X, p
)
m = length(LG.co.g)
n = length(LG.co.h)
get_gradient!(M, X, LG.co, p)
MPm = PowerManifold(M, LG.power_represenation, n)
YPm = zero_vector(MPm, p)
gps = get_inequality_constraint(M, LG.co, p, :)
needed_indices = gps .+ LG.μ ./ LG.ρ .> 0
get_grad_inequality_constraint!(MPm, YPm, LG.co, p, needed_indices)
for i in 1:m
# evaluate in place
if needed_indices[i]
X .+= (gps[i] * LG.ρ + LG.μ[i]) .* YPm[MPm, i]
end
end
hps = get_equality_constraint(M, LG.co, p, :)
MPn = PowerManifold(M, LG.power_represenation, n)
YPn = zero_vector(MPn, p)
get_grad_equality_constraint!(MPn, YPn, LG.co, p, :)
X .+= (hpj .* LG.ρ .+ LG.λ) .* Y
for j in 1:n
# evaluate in place
X .+= (hps[j] * LG.ρ + LG.λ[j]) * YPn[M, j]
end
return X
end
or you can use LG.something.range_something
instead of MPm
and MPn
.
Ah and most the hustle I went through in the last rewrite (and all thinking last week) was to avoid having to regenerate the Power manifold on every function call, hence there is now the
range=
parameters, that by themselves do generate them when you do not pass an existing one. They also provide the exact difference that both nested and array power manifolds (or their tangent spaces to be precise) are possible.
OK, I just didn't see how to get them in ALM so I made them on the spot. The main parts of my idea is using get_inequality_constraint(M, LG.co, p, :)
, get_grad_inequality_constraint!(MPm, YPm, LG.co, p, needed_indices)
and letting the user specify multiple constraints in a single function. If the range is somewhere inside AugmentedLagrangianGrad
, it can surely be just extracted from there.
Hm, what would still allocate a power manifold in every call? Sure sorting just the power representation is maybe ok, but I would prefer (similar to the manifold not being part of the objective) that this is also not art of the objective.
The trick would be that vector functions have a range argument (positional and optional) after p
; the new ContrainedProblem
would set that, the DefaultProblem
not, the default would be the old power manifold used (nested).
Sure :
and nested_indices
sounds like a good extension of the current one. We could even deprecate the _constaints
-functions (and for now call the :
variant for that until we remove it). That sounds very reasonable.
Hm, what would still allocate a power manifold in every call? Sure sorting just the power representation is maybe ok, but I would prefer (similar to the manifold not being part of the objective) that this is also not art of the objective.
PowerManifold
is an immutable struct so it would be on the stack (=fast to create, not counted towards allocations). Just storing the power representation would be enough.
The trick would be that vector functions have a range argument (positional and optional) after
p
; the newContrainedProblem
would set that, theDefaultProblem
not, the default would be the old power manifold used (nested).
Nested by default is OK for me as long as there is a reasonable way to override it.
Sure
:
andnested_indices
sounds like a good extension of the current one. We could even deprecate the_constaints
-functions (and for now call the:
variant for that until we remove it). That sounds very reasonable.
:+1:
Nested by default is OK for me as long as there is a reasonable way to override it.
We could at some point discuss the default, for now that would be necessary to stay nonbreaking. We could discuss that when the next breaking change is due.
For now the idea would be: You could do a ConstrainedProblem(M, obj, PowerManifold...)
;
and sure in there we could store just the representation, I would not mind. Maybe even better / more flexible
Compared to a DefaultProblem(M, obj)
that would trigger the default.
And for the high-level interface I was thinking of a keyword argument for that.
The idea would be similar to the domain (M) being not stored in the objective, the range should neither. For M
my idea in this is that if the objective is agnostic enough of the manifold, one could just exchange the Manifold to run the optimization on another manifold – maybe also just the same manifold with another metric.
The same I would like to keep for the range as well – hence storing at also in the Problem; this is also nicer since we would only store it once and not (for example) also in the cost or such. Storing it multiple times might only lead to inconsistencies.
I see, I think it would be best then to just store the representation type instead of the complete power manifold.
Nice, thanks for that idea. So I will improve that and work on the rest then, also on the new idea with an index range and the :
.
I indeed have a short question on your code idea above.
For
needed_indices = gps .+ LG.μ ./ LG.ρ .> 0 get_grad_inequality_constraint!(MPm, YPm, LG.co, p, needed_indices)
To work I am a bit lost which functions I have to implement. I first thought I just need
That would mean 18 or more further functions. So I wanted to check before I get into a dispatchageddon here...
I think AbstractVector{Bool}
dispatch is enough for get_grad_inequality_constraint
. You can always get a single constraint by setting only one element of the vector to true
, or all of them by setting all elements to true
. So other dispatches would only be an optimization, and quite likely they are not worth it.
Colon
would be nice for get_grad_equality_constraint
.
get a single constraint by setting only one element
that sounds easy in description, complicated in practice since for the allocating thingy that is different things to allocate.
Colon would be nice for get_grad_equality_constraint.
formerly I distinguished gradientS
for all gradients and gradient
with an additional index for one – but sure with Colon these can nicely be combined.
I refactored that partly already and also the range
is not the power manifold any longer but just its representation – which is anyways necessary if we now have different “sizes” we return depending on I
.
So I checked and I am currently not sure how many different cases that would mean to implement. I currently fear it is really 6 function dispatches for
and what each of these would need in allocations That one I really have not yet figured out. If you hand me a bit array with a single 1 – do you want a single tangent vector or a vector of tangent vectors with just one element? I would maybe even tend to the second, since that seems more consistent. Otherwise the place calling this function would have to do a lot of ifs.
For getting a single tangent vector, use and index j
(these already exist).
But I will think about that. For now I can at least also continue (though more like tomorrow) with adopting single access and full access in the other areas of Manopt (“higher up” and in the algorithms.)
I rewrote the access and in get_gradient(M, vgf, p, j, range)
the j
can now be
BitVector
the same length as the number of constraintsAbstractVector{<:Integer}
UnitRange{<:Integer}
Colon
Integer
it was a bit tricky to allocate (vectors of) tangent vector(s) here, but I think I managed to solve this. Of course only the last returns a single tangent vectors, all others a (possibly length 1) vector of tangent vectors. This way this is nice and consistent.
This should now be consistent with power manifolds and the j
in X[pM, j]
.
I currently can not render the docs, since we need a next version of ManifoldsBase (with fill) first. But I will now slowly continue to rewrite the existing code to internally use the new vector gradient function for constraints – this should half the amount of code in quite a few places.
Uff this is quite some rework, but I like both the code reduction by having the new vectorial function as well as the reduction we will see with the :
notation.
But from the vector_objective functions I now managed to rewrite the getters and setters for the constrained objectives. So the main next step is to fix the sub_objectives of ALM and EPM as well as these methods themselves.
edit: A main headache is now to check that :
returns an array still and that this works consistently, e.g. for embedded objectives, but for today I'll stop (Have to wait for ManifoldsBase to test here anyways).
Ah, that went faster than expected. The only errors left in the tests are from the following:
In the old scheme, if you provide a point for the single-function-thingy, I could check the length of the vector. This is maybe a bit more complicated in the array-representation case, but if we can get that back, tests would already pass again (and only checking/reworking LevenbergMarquard is left in rework)
that sounds easy in description, complicated in practice since for the allocating thingy that is different things to allocate.
I wouldn't spend too much time optimizing the allocating variant.
and what each of these would need in allocations That one I really have not yet figured out. If you hand me a bit array with a single 1 – do you want a single tangent vector or a vector of tangent vectors with just one element? I would maybe even tend to the second, since that seems more consistent. Otherwise the place calling this function would have to do a lot of ifs. For getting a single tangent vector, use and index
j
(these already exist).
Yes, I think special-casing single constraint allocating variant doesn't make much sense.
So I checked and I am currently not sure how many different cases that would mean to implement. I currently fear it is really 6 function dispatches for
Let's start from what ALM and EPM need. It's either all (:
) or something BitVector
-ish. So we don't need other access patterns I think.
In the old scheme, if you provide a point for the single-function-thingy, I could check the length of the vector. This is maybe a bit more complicated in the array-representation case, but if we can get that back, tests would already pass again (and only checking/reworking LevenbergMarquard is left in rework)
I will have to check that carefully.
By now both :
and BitVector
s should work fine, I hope I found a good way to realise them. And an array of integers as well. All these return vectors of tangents of an element on the 1-vector product manifold otherwise.
I am just not yet sure the allocations are all optimal. But sure working ELM/ALM to these nicer access methods as you sketched is the next step
I just fixed the first few things in ALM/EPM so that they can be used as before. But this is all super tricky and there seems to be a million failing tests left. As I wrote already, I fear this might really take some time to get right and working. Ok it's about 100 test failing, but to just fix 3(!) it just took me 2 hours. Easy to extrapolate.
There are for example a lot of ConstraintObjective
constructor calls where the automatic “How many constraints are there?” does not yet work as automatic as before.
So the one thing I am not so sure about by now is “just add another representation here” is a lot a lot a lot of work, is that worth it?
Hm, I will take a look.
Don‘t get me wrong, I think the general idea is good and if it works it slows for quite some flexibility. But I fee the code is now more clever than me, so for every error it takes me a long time to narrow it down. So something might still be off in this idea.
I've fixed all EPM tests and some ALM tests. The current ALM failure is due to a different constructor signature -- we probably need a convenience constructor for ConstrainedManifoldObjective
to fix that?
I already tried to add the “guess the number of constraints” thing, but sure if you have even more ideas for convenience, that would be great :)
OK, I've fixed ALM then.
By the way, is this intentional:
function get_inequality_constraint(
M::AbstractManifold, co::ConstrainedManifoldObjective, p, j
)
return get_cost(M, co.inequality_constraints, p, j)
end
?
Yes that is the case since co.inequality_constraints
is a VectorGradientFunction vgf
, which has a cost (vector) and a gradient (vector of them). So evaluation one of the constraints is evaluating the cost of said vgf
.
edit: One large advantage of this is, that we implement all the access to this only once (not once for eq once for ineq constraints – in the end maybe even the same for the Jacobian).
This is a start to rework constraints, make them a bit more flexible and address / resolve #185.
All 3 parts, the function $f$, inequality constraints $g$ and equality constraints $h$ are now stored in their own objectives internally. That way, they can be more flexibly provided – for example a Hessian for one of them.
Besides that, there is more details available on the co-domain of these, especially in case of a single function, the gradient can now also be specified to map into the tangent space of the power manifolds tangent space.
One open point still to check is, how the internal functions can be adopted to this in hopefully a non-breaking way. A main challenge (or where I stopped today), is, when a function that returns gradients (or a gradient of functions) is stored in an objective – and it hence might be cached – how to best model element access. This might also be a nice improvement (or a unification with) the LevenbergMarquart type objective. One could unify that to an objective of type
VectorObjective
maybe.Comments on this idea welcome.
🛣️ Roadmap
👨💻 unification with the LestSquaresObjetice (VectorObjective
?)range
, e.g. ALM and EPM grad.DefaultManifoldProblem
passes in a rangenothing
to trigger the defaultget_gradients
toget_gradient
withColon
ConstrainedManifoldObjective
thatg
andgrad_g
are internally stored as the newVectorialGradientFunction
h
andgrad_h
are internally stored as a mewVectorialGradientFunction