Fused scaled retraction

mateuszbaran commented 1 year ago

Manopt is currently full of expressions like retract!(M, xNew, x, s * η, retr). For quickly evaluating objective functions the fact we have to allocate s * η can be a significant slowdown. I've just tried optimizing Rosenbrock using CG:


using Revise
using Manopt, Manifolds

const M = Euclidean(2)
const p = [1.0, 100.0]

rosenbrock(M, x) = (p[1] - x[1])^2 + p[2] * (x[2] - x[1]^2)^2
x0 = [0.0, 1.0]

function rosenbrock_grad!(M, storage, x)
    # the first part can be computed using AD tools
    storage[1] = -2.0 * (p[1] - x[1]) - 4.0 * p[2] * (x[2] - x[1]^2) * x[1]
    storage[2] = 2.0 * p[2] * (x[2] - x[1]^2)
    # projection is needed because Riemannian optimizers expect
    # Riemannian gradients instead of Euclidean ones.
    project!(M, storage, x, storage)
end

x_opt = conjugate_gradient_descent(
        M,
        rosenbrock,
        rosenbrock_grad!,
        x0;
        evaluation=InplaceEvaluation(),
        stepsize=ArmijoLinesearch(M),
        coefficient=FletcherReevesCoefficient(),
        stopping_criterion=StopAfterIteration(15),
    )

and about half of the time spent (after optimizing storage, but that's a separate story) is in doing these scaled retractions. My suggestion is rather simple and nearly non-breaking: let's move to defining retract!(M::AbstractManifold, q, p, X, t::Real) for each manifold instead of retract!(M::AbstractManifold, q, p, X) by default to fuse this scaling. After that, Manopt would just change to retract!(M, xNew, x, η, s, retr).

What do you think @kellertuer ?

kellertuer commented 1 year ago

That sounds like a lot of rewriting, since it affects all retractions - but it also seems reasonable since we have the same for exp/exp! and those should be probably checked as well then.

So I think it is a very good idea to do that, just also one that is a lot of work.

mateuszbaran commented 1 year ago

It doesn't look to me like that much of work, just a bit tedious. And it would be nice to make Manopt more competitive in terms of performance :slightly_smiling_face: .

kellertuer commented 1 year ago

Sure, since I feel the interfaces are quite mature by now, of problem and state but also all sub-stuff – Making it more performant is definetly something very cool and might be a small paper somewhere when we not only compare manifold functions but man opt functionality.

kellertuer commented 1 year ago

I just discussed something with a student when I noticed

https://github.com/JuliaManifolds/Manopt.jl/blob/3f4172d57d5d516314e8feb6ef06c0f067ee74ce/src/solvers/gradient_descent.jl#L185

that is -step which is a number so the step size, is actually what you propose here,

or: We already have the fusion defined, though currently by default it does the same as with exp, that is scale and call the normal one

https://github.com/JuliaManifolds/ManifoldsBase.jl/blob/65e926325d40aaf1106cf792e2be1d8b0cfd51d4/src/retractions.jl#L801-L810

so what we would have to do is to get to the level3 with this function in Base and only then do the default fuse (unless someone defines this.

mateuszbaran commented 1 year ago

Yes, we already have the scaled variant in the API.

so what we would have to do is to get to the level3 with this function in Base and only then do the default fuse (unless someone defines this.

I don't quite understand what you mean here. What we need is to have

function retract!(
    M::AbstractManifold,
    q,
    p,
    X,
    method::AbstractRetractionMethod = default_retraction_method(M, typeof(p)),
)
    return retract!(M, q, p, X, one(X), method)
end

instead of

function retract!(
    M::AbstractManifold,
    q,
    p,
    X,
    t::Real,
    method::AbstractRetractionMethod = default_retraction_method(M, typeof(p)),
)
    return retract!(M, q, p, t * X, method)
end

And then for each manifold implement the variant with t instead of the one without t.

We probably also need to t::Number instead of t::Real to allow complex numbers and quaternions.

kellertuer commented 1 year ago

Yes, I like your result, but it might be a bit to complicated to enforce anyone to implement the fused-one (the not with t).

So my idea would be (in ManifoldsBase) to introduce the functions as you say (with t::Numbersure).

But then we should make sure, that it is still fine to just implement the non-fused one (maybe to get started, but also to be backwards compatible), that is maybe a bit of boilerplate code, but I think necessary. We just have to check for a good order probably

You currently propose to switch order of preference (currently, the fused-one falls back to the non-fused one, you want to switch that).

But then you want to change level 2, that is the the _retract (_retract!) functions to all have a t ? Or do you want to have both on this level? (maybe the one with t is enough and keeps it simple).

But then on level 3 (that is the retract_caley / retract_caley!) ones should fuse-by-default (that is the variant with t should fall back to the one without? Otherwise we loose backwards compatibility. since Level 2 is always with t this should be fine?

To summarise, the new workflow is (just for in-place, the other ones when to allocate and step over should stay where they are

retract!(M, q, p, X, m) is “on highest level” and defaults to a one(eltype(X))(probably?) into step 2
retract!(M, q, p, X, t, m) passes to _retract!(M, q, p, X, t, m)
all the _retract! functions are now with t
the specific variants like retract_caley!(M, q, p, X, t, m) are now the defaults we overwrite, but if not overwritten they
default to retract_caley!(M, q, p, t*X, m) which is mainly for backwards-compatibility and ease of start for new users implementing a retraction

Is that dispatch-workflow what you have in mind?

mateuszbaran commented 1 year ago

Hm, I haven't thought that much about interactions with decorators and levels. Your new workflow looks fine with one caveat. If we were to do the same change for exp, this would result in an infinite cycle that would give awkward-looking errors (exp(NewManifold(), p, X) -> exp(NewManifold(), p, X, 1.0) -> exp(NewManifold(), p, 1.0 * X)).

kellertuer commented 1 year ago

\o/ \o/ (party crowd) for infinite cycles!

I have not yet thought about exp in this scenario, my main idea was to have the fused-one first but falling back to the non-fused one which is currently usually implemented.

So exp currently has the upper two levels in my workflow switched? Maybe we want to enable the lower-level-fused exp variant as well and move exp around as well?

mateuszbaran commented 1 year ago

exp and log currently mostly ignore the levels unfortunately.

kellertuer commented 1 year ago

We until now kept them a little special, yes; that does not mean we can switch the order of the non-fused and the fused one for exp.

mateuszbaran commented 1 year ago

I have a small design issue with storage, which is (after patching fused retractions in) about half of the time spent running the example from first post. Ideally, previous iterate and gradient storage would be allocated once, stored in a type-stable field and modified by copyto!. This is currently not possible without major changes, either to the storage structure or places that use storage. Which of these two things would you prefer to change to make this faster?

kellertuer commented 1 year ago

Uff, in that generality without understanding (a) which major changes and (b) its implications, I can not answer this.

but you are right, for best of cases, the stored iterate and gradient (or any other stored value) should only be allocated once and be stored in a type-stable field.

I had hoped that my storage already is able to do that – but as I maybe wrote a few times – this is a design from a very early stage of me learning Julia (probably back in 2016) – so maybe that is not the case. So can you elaborate a bit on which changes are necessary? Since you mentioned somewhen earlier that my storage might not be that efficient, I maybe have a slight tendency to improve the storage structure and keep the places that use the storage as is?

mateuszbaran commented 1 year ago

So can you elaborate a bit on which changes are necessary? Since you mentioned somewhen earlier that my storage might not be that efficient, I maybe have a slight tendency to improve the storage structure and keep the places that use the storage as is?

OK, so the first point is that it is not possible to construct for example efficient FletcherReevesCoefficient without passing to it at least types of objects in which point and tangent vector are stored. Ideally a point and a tangent vector would be available as a template for allocation but it can be reasonably worked around. That's what I attempted at first but then I noticed that it results in a very complex design that can be massively simplified by just having internal storage in FletcherReevesCoefficient or a custom StoreStateAction, which would be basically equally easy to use. Maybe a new AbstractStateAction would be nicer.

Then there is a matter of constructing FletcherReevesCoefficient (or its storage). Ideally I'd like to have a constructor of the form FletcherReevesCoefficient(M, p0) to construct storage. Would that be OK?

kellertuer commented 1 year ago

For your first part – sorry, I do not follow. I just don't understand what you mean. But for the question on whether to store the point/vector explicitly in FRC – I would prefer not to do that but really to use a storage. For example I have no idea what you propose with a new AbstractStateAction, since I am just not able to follow the arguments before. It seems you mean something very specific but write it in an abstract way that I can not follow.

Sure a storage should be able to get an initial point/vector to set up (and initialise) the storage, but I think that should either be doable or is even done already to some extend? I can check in the evening.

For the constructor – that one I though about a bit already and there is both a way to define that now and a way I want to move to. To avoid that you have to always set the manifold (e.g. when it is used for the default retraction or something) I wanted to move to f(M::AbstractManifold=DefaultManifold(2)) as first arguments especially in these inside functions like the coefficients. Then, a lot of these functions already have a initial_vector=rand(M; vecetor_at=p) or similar keyword argument, we could use initial_point=rand(M) similarly as a keyword. And sure both of these keywords can be passed to the storage to create their internal structures.

kellertuer commented 1 year ago

Oh, maybe the init function I had in mind does not yet exist – but sure a constructor for the storage that initialises some fields would be very useful, indeed.

JuliaManifolds / Manifolds.jl

Fused scaled retraction #569