JuliaManifolds / Manopt.jl

šŸ”ļøManopt. jl ā€“ Optimization on Manifolds in Julia
http://manoptjl.org
Other
314 stars 40 forks source link

quasi_Newton has issue with negative memory size. #340

Closed salbert83 closed 8 months ago

salbert83 commented 8 months ago

The attached notebook fails when I set memory_size to -1. I am running Julia 1.9.4 with the following package versions:

IJulia v1.24.2 ManifoldDiff v0.3.10 Manifolds v0.9.11 Manopt v0.4.45 Zygote v0.6.68

I had to save the notebook as a .txt file so Github would allow me to attach.

QuasiNewtonQuestion.txt

kellertuer commented 8 months ago

Hi,

I extracted the following code

using LinearAlgebra, Manifolds, ManifoldDiff, Manopt, Random

m, n  = 2, 8
H = randn(ComplexF64, m , n)
HtH = H'*H
P = randn(ComplexF64, n, m)
f(M, P) = real(logdet(I(m) + P' * HtH * P))

euclidean_āˆ‡f(M, P) = 2HtH*P / (I(m) + P' * HtH * P)

manif = Manifolds.ArraySphere(n, m, field = ā„‚)
āˆ‡f(M, P) = ManifoldDiff.riemannian_gradient(M, P, euclidean_āˆ‡f(get_embedding(M), embed(M, P)))

# Works fine for a positive memory size, does not otherwise
#s = m*m
s = -1
res = @time Manopt.quasi_Newton(manif, f, āˆ‡f, P, memory_size = s, stopping_criterion = StopAfterIteration(2000) | StopWhenGradientNormLess(1e-6))

and can confirm, that this crashes if the memory is set to -1 but does not otherwise. The reason is quite simple.

For a nonnegative memory_size we store a set of tangent vectors (named the same as in Nocedal & Wright) $y_k$, $s_k$ where $k$ runs from 1 to memory_size and from these the compute the descent direction. For most problems I would recommend using this method, if you feel it is not precise enough, choose a high value (up to manifold dimension)

If you set memory_size to -1 we do not use the aforementioned method but store the (iteratively updated) Hessian approximation in a matrix. On manifolds this has the small disadvantage, that matrices that describe operations in the tangent space are always depending on a basis of the tangent space. Indeed that is the same in the Euclidean case, but the set of unit vectors is a very default choice we often do not think about. But here we need a basis, and to be precise, this line

https://github.com/JuliaManifolds/Manopt.jl/blob/da840548d591cafc683488bb3893544f97fe5083/src/plans/quasi_newton_plan.jl#L405

does exactly that, it transforms the tangent vector into coordinates, steps into the matrix multiplication (or solving a linear system in other cases) and then reconstructs the vector.

Now the main problem is that on the manifold you want to work with, no one has yet implemented the DefaultOrthonormalBasis() we set as a default basis (unless the manifold declares a favourite default itself). One reason might be that the ArraySphere just came along as ā€œAh, we could do that as well, let's implement all main functionsā€ but no one implemented the bases. In other cases it might also be that certain functions (on other manifolds) do not have a nice/efficient implementation.

So this is a problem for Manifolds.jl. I am a bit surprised, since I would have thought that this function

https://github.com/JuliaManifolds/Manifolds.jl/blob/81a43d434a1ed6f515c00dd9814fd45d0d99ae2e/src/manifolds/Sphere.jl#L238-L247

would be the fit and also be dispatched to, so it might also be a bug in that functions dispatch indeed. Please open an issue over there.

edit: Ah, I see why the linked function does not hit, that one is currently only implemented for the real (abstract) spheres. Yours is complex. On complex spheres one has the small challenge every now and then whether to take a complex basis with real coefficients (twice as many basis vectors, but nice real coefficients) or real basis with complex coefficients (less basis vectors but also complex coefficients).

mateuszbaran commented 8 months ago

If you have the formula for real coordinate basis of tangent spaces to the complex sphere, I can add it to Manifolds.jl. That shouldn't be too hard to derive, you can probably just start from the function written by Ronny and split X to real and imaginary parts.

kellertuer commented 8 months ago

Yes, probably adapting that and getting to a complex-basis-with-real-coefficients vector for complex abstract spheres would be the way to go. For now I just do not have a good idea how to compute that.

mateuszbaran commented 8 months ago

BTW, I'd suggest trying L-BFGS with limited memory on Manopt 0.4.46 (will be released very soon), with Hager-Zhang linesearch from LineSearches.jl. After fixes it has similar performance to Optim.jl on a couple of test problems.

kellertuer commented 8 months ago

Note that the L here means memory_size > 0. A negative number would also do (inverse) BFGS as well, but storing a full matrix. But yeah we are currently working on even getting these algorithms faster (ok, the ā€œweā€ is definitely mainly Mateusz).

kellertuer commented 8 months ago

I started a first sketch, but I have to still figure out the details, mainly in allocation (along the way=) of the actual coefficient vector. If you have a nicer formula, feel free to contribute.

And ā€“Ā thanks ā€“Ā this was a nice thing to ponder about during my hike today, and also just about 30 minutes to heck this sketch. I'll leave this open though we now have a place over on Manifolds.jl for this, in case there is general feedback on the current comments here.

salbert83 commented 8 months ago

BTW, I'd suggest trying L-BFGS with limited memory on Manopt 0.4.46 (will be released very soon), with Hager-Zhang linesearch from LineSearches.jl. After fixes it has similar performance to Optim.jl on a couple of test problems.

Yes, it does work quite well. I just raised the issue because, based on the documentation, I wasn't expecting a crash. Thanks to everyone for addressing so quickly and thoughtfully!

kellertuer commented 8 months ago

Well, usually one should not expect things to crash ;)

But we did start a section with each solver called ā€œTechnical Detailsā€ that lists which functions one should implement for a manifold for this solver to work. This is meant for example when you implement your own manifold, that you know the minimal requirements. In your case it could also have helped to see ā€œah for the matrix variants I need a basisā€. I will check whether that is in there and correctly phrased then.

Concerning the open PR on doing the basis you are missing ā€“Ā there are a few minor technical difficulties still, so it might take a while until that is covered, we are but only a very small team working on these packages involved here (originally 3, currently more realistically 2).

kellertuer commented 8 months ago

I will close this for now. My idea to compute an ONB for your case was not successful, so to be honest I just have no clue how to provide a good ONB in your case for the manifold you have.

I checked the docs and under Technical Derails (as mentioned in my last post) the get_coordinates requirement is mentioned as the last point https://manoptjl.org/stable/solvers/quasi_Newton/#sec-qn-technical-details. This function is exactly what a complex sphere (any of them, also the array spherre) are missing, but that is an issue for Manifolds.jl and not here.

If you feel the docs could be improved, feel free to comment on this issue and reopening it.