trust_regions loses normalization on Sphere

fph commented 11 months ago

After running sufficiently many iterations of trust_regions on the complex sphere, the result no longer has norm 1. Here is a minimal example:

using Random, Manifolds, Manopt, Zygote

# we set up a simple function to keep the optimizer busy.

n = 100
Random.seed!(0)
A = sprandn(n, n, 0.2); A = A + A'
M = Manifolds.Sphere(n-1, ℂ)
x0 = project(Manifolds.Sphere(size(A,1) - 1, ℂ), randn(Complex{eltype(A)}, size(A, 1)))

# this has no minimum since A is not posdef, but we don't care.
f(M, x) = real(x'*A*x)
g(M, x) = project(M, x, gradient(x->f(M,x), x)[1])

x = trust_regions(M, f, g, x0)
@show norm(x)                                        

x = quasi_Newton(M, f, g, x0)
@show norm(x)

x = conjugate_gradient_descent(M, f, g, x0)
@show norm(x)

Returns

norm(x) = 1.0038126803255494  # trust_regions
norm(x) = 1.0000000000000002  # quasi_Newton
norm(x) = 1.0  # conjugate_gradient_descent

On more complicated examples, the norm gets even farther away from 1.

kellertuer commented 11 months ago

Hi, the super short answer:

trust_regions: That is a known instability problem for some costs on some manifolds (Rayleigh on the sphere is one such combination) and there is the project!= keyword is there for help
quasi_Newton is probably still fine the error you see super small, since
```
julia> 1-1.0000000000000002
-2.220446049250313e-16
```
so it might even just be the dimension of the sphere causing an error in the computation of the norm, since this is of order of machine precision.

Background

There is actually 2 things interacting here

1) What do we do when we get an q=exp(M, p, X) exponential map (or similarly a retraction, but for the sphere its exp) with a non-tangent vector X– so here if X is not orthogonal to p? We do not want to error, that would be too strict – we could try to fix the X but that might be costly and on some manifolds not easy / uniquely to do. So we leave that behaviour undefined. In practice – for the sphere – using non-tangent X simply yields that q is no longer on the sphere. This might happen already for slight deviations dot(p,X) (maybe of size 1e-10?). if we now have exp(M, q, Y) for non-sphere q we have the same problem. What should we do? We leave it as is and do not check inputs here (since also that is costly on high-dimensional spheres or other manifolds). What happens then is, that we actually stay on the sphere of the radius of q. But keep in mind that exp actually is not defined for such q originally, so there is not even a right or wrong.

2) trust region has a subsolver, that solves the trust region sub problem (at least approximately). The result might – in a few cases – be slightly inaccurate even to the extend that the result is not a tangent vector. Sometimes also just of order e1-10. Together with point 1) on the sphere these sadly accumulate.

The fix could be to always project the sub solver result (back) one the tangent space. The disadvantage is, that that might be costly on some manifolds or sometimes even not be implemented or defined. One example is the fixed rank matrices, where UMTVector has 3 fields and for a projection we would first have to embed this in the set of matrices,....for high-dimensional manifolds – super costly. If we would do that always – some might (rightfully) complain, that everything is slow.

But we have therfore the optional project! keyword, which by default is a copyto! (i.e. it copies the result from the subsolver back to the main solver). setting that to project! activates projection (inplace, so with as less as possible allocations)

Updated Example

Hm I could not directly run your example since with your packages (at least on my Julia) sprandn was not defined, but maybe you forgot to add SparseArrays?

using Random, Manifolds, Manopt, Zygote, SparseArrays

# we set up a simple function to keep the optimizer busy.

n = 100
Random.seed!(0)
A = sprandn(n, n, 0.2); A = A + A'
M = Manifolds.Sphere(n-1, ℂ)
x0 = project(Manifolds.Sphere(size(A,1) - 1, ℂ), randn(Complex{eltype(A)}, size(A, 1)))

# this has no minimum since A is not posdef, but we don't care.
f(M, x) = real(x'*A*x)
g(M, x) = project(M, x, gradient(x->f(M,x), x)[1]) # you could also check out riemannian_gradient

x_old = trust_regions(M, f, g, x0)

x = trust_regions(M, f, g, x0; (project!)=project!)

then we have

julia> norm(x)
1.0000000000000027

so that is in the area of 1e-15 which is again close to machine precision

and we even see

julia> f(M,x_old)
-13.29339027908081

julia> norm(M, x_old, g(M,x_old))
0.183160719637751

julia> f(M,x)
-13.225587693798204

julia> norm(M, x, g(M,x))
2.921455811428008e-13

Not only is the new cost lower (ok the old point is also not in the domain of the “shortened version of Rayleigh” (no division by the norm); but also that the new one does have a small gradient.

Slightly related

I started reworking trust regions to finally allow other sub solvers – but that is still a bit of work (and needs the next breaking version of ManifoldsBase.jl, 0.15) see https://github.com/JuliaManifolds/Manopt.jl/pull/294, since we discussed that earlier. So this might still take some time, but will be available some time.

One more thing

Here, with projection the algorithm is also much faster (ok because it stops with a small gradient not due to max iterations ;))

julia> @btime x = trust_regions($M, $f, $g, $x0; (project!)=$project!);
  5.454 ms (12737 allocations: 41.69 MiB)

julia> @btime x_old = trust_regions($M, $f, $g, $x0);
  102.932 ms (219532 allocations: 636.69 MiB)

fph commented 11 months ago

Thanks for the very detailed answer! I missed that option in the docs, sorry.

I confirm also my 'true' function of interest looks a bit like a Rayleigh quotient.

And I confirm I forgot to include SparseArrays in my example, sorry.

kellertuer commented 11 months ago

Great to here that that solved it :)

I agree that by default not having the project (and hence these cases) might be a point of discussion, but it is a choice between speed and availability on more manifolds (like fixed rank) versus stability/inaccuracy for a few others.

If you feel we could improve the docs here, let me know.

fph commented 11 months ago

If you feel we could improve the docs here, let me know.

The project! parameter was already there, so it's on me for not seeing it. The docs are fine, but since you are explicitly asking for suggestions for improvement I can put in one. You write

That is a known instability problem for some costs on some manifolds (Rayleigh on the sphere is one such combination) and there is the project!= keyword is there for help

and this problem seems to be specific to the trust_regions solver. So I can suggest adding a remark that this might be a concern. The page https://manoptjl.org/stable/solvers/trust_regions/ already has a "Remarks" section, a note would fit well in there.

I could submit a PR myself if you wish, but of course I am far from being an expert since I did not even know about this issue until last week. :)

Many thanks for your work on this package!

kellertuer commented 11 months ago

I am rework ing trust regions here https://github.com/JuliaManifolds/Manopt.jl/pull/294 anyways, to have a really exchangeable subsolver (the original code is one of the first solvers we had before we had the sub solver-framework developed), for that I have to wait for ManifoldsBase.jl 0.15 (to have a generic tangent space available).

But I will add such a comment for sure, no problem :)

Thanks for the feedback!

JuliaManifolds / Manopt.jl