Improvements to Manifolds

antoine-levitt commented 7 years ago

See https://github.com/JuliaNLSolvers/Optim.jl/pull/435

[ ] Second order algorithms Never used it, so I'm not the best person to do this
[ ] Compare performance with other libraries Competitors include ROPTLIB, ManOpt, https://github.com/NickMcNutt/ManifoldOptim.jl (abandoned)
[ ] Vector transport The proper way to implement algorithms like CG and BFGS is to use vector transport to transport the information at one point to another. Right now this is done with projections, which might not be the most efficient
[ ] More manifolds and variants of existing manifolds (e.g. different retractions) See e.g. the list in http://www.math.fsu.edu/~whuang2/Indices/index_ROPTLIB.html Also {x, Ax = b}, or intersection manifold (just do the projection on both manifolds iteratively and hope it converges)
[ ] Optimize number of retractions and projections I have been pretty liberal with the use of retractions and projections in the optimizers, maybe some of them are unnecessary
[ ] A better way to do product manifolds? Right now, the two components are stored in a flat 1D array, which might be suboptimal
[ ] Arbitrary inner product The Sphere and Stiefel manifolds could take a more general inner product

simonbyrne commented 6 years ago

Any chance of support for positive semi-definite matrices?

pkofod commented 6 years ago

Any chance of support for positive semi-definite matrices?

According to the docs Manopt has this, but their code is GPL-3 licensed, so I think it's best if we avoid links to their code base in this thread, as it will probably exclude the reader from contributing.

antoine-levitt commented 6 years ago

Coincidentally, @whuang08 pointed to me just yesterday that a retraction for the manifold of SPD matrices is R_eta(X) = X + eta + 0.5 eta X^{-1} eta, which breaks the current API (which only has a retract(x) function whereas this would need a retract(x,eta)). This should not be too bad to adapt though. That retraction seems fishy to me because it's singular at zero eigenvalues, but apparently it's useful. I don't have any experience in SPD matrices, do you have a use case/paper/reference implementation?

antoine-levitt commented 6 years ago

Ah, good point about the licence, will avoid looking at ROPTLIB also then. It is however described in papers.

simonbyrne commented 6 years ago

It probably doesn't answer your question directly (it's been a while since I've worked with manifolds) but the easiest option is usually parametrise SPD matrices via a Cholesky factor which is constrained to have non-negative diagonals.

antoine-levitt commented 6 years ago

That looks substantially different from using a manifold type algorithm, and is probably best treated by a solver with inequality constrains (JuMP & friends).

simonbyrne commented 6 years ago

Ah, okay. Thanks.

simonbyrne commented 6 years ago

Hmm, my problem is that I have quite a complicated objective function, which doesn't seem to be supported by JuMP. I guess I can write the reparametrisation myself, but it would be nice if there were some tools for this.

whuang08 commented 6 years ago

Two references for SPD matrices are: https://www.math.fsu.edu/~whuang2/papers/RMKMSPDM.htm and https://arxiv.org/abs/1507.02772 Of course, there are more references since computation on SPD matrices is an active topic as far as I know. (Many people discussed about this when I was in a linear algebra conference.)

Best,

Wen

On Mon, Sep 25, 2017 at 3:37 PM, Simon Byrne notifications@github.com wrote:

Hmm, my problem is that I have quite a complicated objective function, which doesn't seem to be supported by JuMP. I guess I can write the reparametrisation myself, but it would be nice if there were some tools for this.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaNLSolvers/Optim.jl/issues/448#issuecomment-332004721, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ5gb8_vJZE1455Udt7MXxysqYnGfM4gks5smA8LgaJpZM4Ogbic .

antoine-levitt commented 6 years ago

I guess it depends on whether you expect your minimum to hit the boundary (matrices with zero eigenvalues) or not. If you do then I would imagine it's hard to bypass an inequality constrained solver. If you don't then I guess manifold algorithms are efficient.

antoine-levitt commented 6 years ago

Just putting this here for reference: I benchmarked finding the first few eigenvalues of a large random matrix in Optim and in manopt. The results is that the implementations are comparable and result in about the same number of iterations/matvec. Code for manopt:

% Generate random problem data.
% rng(0);
n = 1000;
m = 10;
A = randn(n);
A = .5*(A+A.');
x0 = randn(n,m);

% Create the problem structure.
manifold = stiefelfactory(n,m);
problem.M = manifold;

% Define the problem cost function and its Euclidean gradient.
problem.cost  = @(x) -trace(x'*(A*x));
problem.egrad = @(x) -2*A*x;

% Solve.
[x, xcost, info, options] = rlbfgs(problem, x0);

% Display some statistics.
figure;
semilogy([info.iter], [info.gradnorm], '.-');
xlabel('Iteration number');
ylabel('Norm of the gradient of f');

and Optim:

using Optim

# srand(0)
const n = 1000
const m = 10
M = randn(n,n)
const A = (M+M')/2
f(x) = -vecdot(x,A*x)
g(x) = -2*A*x
g!(stor,x) = copy!(stor,g(x))
x0 = randn(n,m)

manif = Optim.Stiefel()
Optim.optimize(f, g!, x0, Optim.LBFGS(m=30,manifold=manif, linesearch=Optim.HagerZhang()), Optim.Options(show_trace=true, allow_f_increases=true,g_tol=1e-6))

pkofod commented 6 years ago

Cool, good to have some examples like this.

I tried swapping out HagerZhang for MoreThuente. The former gives

Results of Optimization Algorithm
 * Algorithm: L-BFGS
 * Starting Point: [0.15317647539137902,0.42376052107832024, ...]
 * Minimizer: [-0.04295817425645339,-0.018684703787118326, ...]
 * Minimum: -4.311236e+02
 * Iterations: 136
 * Convergence: true
   * |x - x'| ≤ 0.0e+00: false 
     |x - x'| = 6.80e-08 
   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
     |f(x) - f(x')| = -1.45e-14 |f(x)|
   * |g(x)| ≤ 1.0e-06: true 
     |g(x)| = 9.52e-07 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 410
 * Gradient Calls: 411

while MoreThuente gives

Results of Optimization Algorithm
 * Algorithm: L-BFGS
 * Starting Point: [0.05351779586314921,-0.7229077835364466, ...]
 * Minimizer: [-0.04590592560787643,-0.03998550452894138, ...]
 * Minimum: -4.301612e+02
 * Iterations: 137
 * Convergence: true
   * |x - x'| ≤ 0.0e+00: false 
     |x - x'| = 6.96e-08 
   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
     |f(x) - f(x')| = -2.51e-14 |f(x)|
   * |g(x)| ≤ 1.0e-06: true 
     |g(x)| = 8.44e-07 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 147
 * Gradient Calls: 148

Interesting to see the significant reduction in objective calls here. I know this is know - HZ needs to be conservative, but if MT is suited for the example at hand, there's a pretty neat gain to find in terms of calls to f(x)/g(x).

(using A_mul_B! in g! can also decrease runtime by 25%, but that obviously has nothing to do with how well the optimizer works :))

anriseth commented 6 years ago

I tried swapping out HagerZhang for MoreThuente.

Hmm, and BackTracking stagnates

 * Algorithm: L-BFGS
 * Starting Point: [0.5494801640402754,1.2759422464582777, ...]
 * Minimizer: [-0.003418669419144482,0.00966606972837302, ...]
 * Minimum: -3.174551e+02
 * Iterations: 8
 * Convergence: true
   * |x - x'| ≤ 0.0e+00: false 
     |x - x'| = 1.05e-09 
   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: true
     |f(x) - f(x')| = 0.00e+00 |f(x)|
   * |g(x)| ≤ 1.0e-06: false 
     |g(x)| = 8.45e+00 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 52
 * Gradient Calls: 10

antoine-levitt commented 6 years ago

@anriseth yes, I reported this on gitter. This is strange, because I used backtracking on similar problems before and it worked very well. I reduced it further here: https://github.com/JuliaNLSolvers/Optim.jl/issues/626

@pkofod yes, LBFGS usually produces pretty good initial steps, so HZ is counterproductive in requiring more evals per iteration. In my previous tests backtracking was a much better default choice; @anriseth suggested to make it the default for LBFGS and I agree: it should really only be the default for CG, where it is more important to have a good stepsize.

jagot commented 5 years ago

I am particularly interested in the last two items in the list. For the custom metric, I implemented the Löwdin transform that symmetrically orthogonalizes a non-orthogonal basis, such that the SVD:s &c that assumes an orthogonal basis works. The results are then transformed back to the non-orthogonal basis.

Please see my implementation here: https://github.com/JuliaAtoms/SCF.jl/blob/b25be8df29efb21ce573a4b9041709ee78573353/src/manifolds.jl I am well aware that it is not particularly efficient and there may be a mathematically more beautiful approach, but it works (TM).

Would there be any interest in including these extended manifolds into Optim.jl? Any improvement suggestions?

antoine-levitt commented 5 years ago

For sure, please do put it in there. The longer term plan is to split those things off in Manifolds.jl,see https://github.com/JuliaNLSolvers/Manifolds.jl/issues/35, but right now it can live here.

I think you can avoid doing the lowdin transform explicitly, and just eigendecompose overlap matrices. Will take a closer look when I'm at a computer

jagot commented 5 years ago

Well, I implemented the Löwdin transform by eigendecomposing the overlap matrix (when it is not simply an UniformScaling) and caching the forward and backward transforms. I was more thinking if one could implement something via SVD, but it will anyway end up being dense transform matrices (with the exception for diagonal overlap matrices, of course).

antoine-levitt commented 5 years ago

I mean computing the overlap as X'SX, factoring that, and using that transform to modify X. That's more efficient when the first dimension of X is large, esp. when S can be applied in a matrix-free way

JuliaNLSolvers / Optim.jl

Improvements to Manifolds #448