JuliaNLSolvers / Optim.jl

Optimization functions for Julia
Other
1.12k stars 217 forks source link

Improvements to Manifolds #448

Open antoine-levitt opened 7 years ago

antoine-levitt commented 7 years ago

See https://github.com/JuliaNLSolvers/Optim.jl/pull/435

simonbyrne commented 6 years ago

Any chance of support for positive semi-definite matrices?

pkofod commented 6 years ago

Any chance of support for positive semi-definite matrices?

According to the docs Manopt has this, but their code is GPL-3 licensed, so I think it's best if we avoid links to their code base in this thread, as it will probably exclude the reader from contributing.

antoine-levitt commented 6 years ago

Coincidentally, @whuang08 pointed to me just yesterday that a retraction for the manifold of SPD matrices is R_eta(X) = X + eta + 0.5 eta X^{-1} eta, which breaks the current API (which only has a retract(x) function whereas this would need a retract(x,eta)). This should not be too bad to adapt though. That retraction seems fishy to me because it's singular at zero eigenvalues, but apparently it's useful. I don't have any experience in SPD matrices, do you have a use case/paper/reference implementation?

antoine-levitt commented 6 years ago

Ah, good point about the licence, will avoid looking at ROPTLIB also then. It is however described in papers.

simonbyrne commented 6 years ago

It probably doesn't answer your question directly (it's been a while since I've worked with manifolds) but the easiest option is usually parametrise SPD matrices via a Cholesky factor which is constrained to have non-negative diagonals.

antoine-levitt commented 6 years ago

That looks substantially different from using a manifold type algorithm, and is probably best treated by a solver with inequality constrains (JuMP & friends).

simonbyrne commented 6 years ago

Ah, okay. Thanks.

simonbyrne commented 6 years ago

Hmm, my problem is that I have quite a complicated objective function, which doesn't seem to be supported by JuMP. I guess I can write the reparametrisation myself, but it would be nice if there were some tools for this.

whuang08 commented 6 years ago

Two references for SPD matrices are: https://www.math.fsu.edu/~whuang2/papers/RMKMSPDM.htm and https://arxiv.org/abs/1507.02772 Of course, there are more references since computation on SPD matrices is an active topic as far as I know. (Many people discussed about this when I was in a linear algebra conference.)

Best,

Wen

On Mon, Sep 25, 2017 at 3:37 PM, Simon Byrne notifications@github.com wrote:

Hmm, my problem is that I have quite a complicated objective function, which doesn't seem to be supported by JuMP. I guess I can write the reparametrisation myself, but it would be nice if there were some tools for this.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaNLSolvers/Optim.jl/issues/448#issuecomment-332004721, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ5gb8_vJZE1455Udt7MXxysqYnGfM4gks5smA8LgaJpZM4Ogbic .

antoine-levitt commented 6 years ago

I guess it depends on whether you expect your minimum to hit the boundary (matrices with zero eigenvalues) or not. If you do then I would imagine it's hard to bypass an inequality constrained solver. If you don't then I guess manifold algorithms are efficient.

antoine-levitt commented 6 years ago

Just putting this here for reference: I benchmarked finding the first few eigenvalues of a large random matrix in Optim and in manopt. The results is that the implementations are comparable and result in about the same number of iterations/matvec. Code for manopt:

% Generate random problem data.
% rng(0);
n = 1000;
m = 10;
A = randn(n);
A = .5*(A+A.');
x0 = randn(n,m);

% Create the problem structure.
manifold = stiefelfactory(n,m);
problem.M = manifold;

% Define the problem cost function and its Euclidean gradient.
problem.cost  = @(x) -trace(x'*(A*x));
problem.egrad = @(x) -2*A*x;

% Solve.
[x, xcost, info, options] = rlbfgs(problem, x0);

% Display some statistics.
figure;
semilogy([info.iter], [info.gradnorm], '.-');
xlabel('Iteration number');
ylabel('Norm of the gradient of f');

and Optim:

using Optim

# srand(0)
const n = 1000
const m = 10
M = randn(n,n)
const A = (M+M')/2
f(x) = -vecdot(x,A*x)
g(x) = -2*A*x
g!(stor,x) = copy!(stor,g(x))
x0 = randn(n,m)

manif = Optim.Stiefel()
Optim.optimize(f, g!, x0, Optim.LBFGS(m=30,manifold=manif, linesearch=Optim.HagerZhang()), Optim.Options(show_trace=true, allow_f_increases=true,g_tol=1e-6))
pkofod commented 6 years ago

Cool, good to have some examples like this.

I tried swapping out HagerZhang for MoreThuente. The former gives

Results of Optimization Algorithm
 * Algorithm: L-BFGS
 * Starting Point: [0.15317647539137902,0.42376052107832024, ...]
 * Minimizer: [-0.04295817425645339,-0.018684703787118326, ...]
 * Minimum: -4.311236e+02
 * Iterations: 136
 * Convergence: true
   * |x - x'| ≤ 0.0e+00: false 
     |x - x'| = 6.80e-08 
   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
     |f(x) - f(x')| = -1.45e-14 |f(x)|
   * |g(x)| ≤ 1.0e-06: true 
     |g(x)| = 9.52e-07 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 410
 * Gradient Calls: 411

while MoreThuente gives

Results of Optimization Algorithm
 * Algorithm: L-BFGS
 * Starting Point: [0.05351779586314921,-0.7229077835364466, ...]
 * Minimizer: [-0.04590592560787643,-0.03998550452894138, ...]
 * Minimum: -4.301612e+02
 * Iterations: 137
 * Convergence: true
   * |x - x'| ≤ 0.0e+00: false 
     |x - x'| = 6.96e-08 
   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
     |f(x) - f(x')| = -2.51e-14 |f(x)|
   * |g(x)| ≤ 1.0e-06: true 
     |g(x)| = 8.44e-07 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 147
 * Gradient Calls: 148

Interesting to see the significant reduction in objective calls here. I know this is know - HZ needs to be conservative, but if MT is suited for the example at hand, there's a pretty neat gain to find in terms of calls to f(x)/g(x).

(using A_mul_B! in g! can also decrease runtime by 25%, but that obviously has nothing to do with how well the optimizer works :))

anriseth commented 6 years ago

I tried swapping out HagerZhang for MoreThuente.

Hmm, and BackTracking stagnates

 * Algorithm: L-BFGS
 * Starting Point: [0.5494801640402754,1.2759422464582777, ...]
 * Minimizer: [-0.003418669419144482,0.00966606972837302, ...]
 * Minimum: -3.174551e+02
 * Iterations: 8
 * Convergence: true
   * |x - x'| ≤ 0.0e+00: false 
     |x - x'| = 1.05e-09 
   * |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: true
     |f(x) - f(x')| = 0.00e+00 |f(x)|
   * |g(x)| ≤ 1.0e-06: false 
     |g(x)| = 8.45e+00 
   * Stopped by an increasing objective: false
   * Reached Maximum Number of Iterations: false
 * Objective Calls: 52
 * Gradient Calls: 10
antoine-levitt commented 6 years ago

@anriseth yes, I reported this on gitter. This is strange, because I used backtracking on similar problems before and it worked very well. I reduced it further here: https://github.com/JuliaNLSolvers/Optim.jl/issues/626

@pkofod yes, LBFGS usually produces pretty good initial steps, so HZ is counterproductive in requiring more evals per iteration. In my previous tests backtracking was a much better default choice; @anriseth suggested to make it the default for LBFGS and I agree: it should really only be the default for CG, where it is more important to have a good stepsize.

jagot commented 5 years ago

I am particularly interested in the last two items in the list. For the custom metric, I implemented the Löwdin transform that symmetrically orthogonalizes a non-orthogonal basis, such that the SVD:s &c that assumes an orthogonal basis works. The results are then transformed back to the non-orthogonal basis.

Please see my implementation here: https://github.com/JuliaAtoms/SCF.jl/blob/b25be8df29efb21ce573a4b9041709ee78573353/src/manifolds.jl I am well aware that it is not particularly efficient and there may be a mathematically more beautiful approach, but it works (TM).

Would there be any interest in including these extended manifolds into Optim.jl? Any improvement suggestions?

antoine-levitt commented 5 years ago

For sure, please do put it in there. The longer term plan is to split those things off in Manifolds.jl,see https://github.com/JuliaNLSolvers/Manifolds.jl/issues/35, but right now it can live here.

I think you can avoid doing the lowdin transform explicitly, and just eigendecompose overlap matrices. Will take a closer look when I'm at a computer

jagot commented 5 years ago

Well, I implemented the Löwdin transform by eigendecomposing the overlap matrix (when it is not simply an UniformScaling) and caching the forward and backward transforms. I was more thinking if one could implement something via SVD, but it will anyway end up being dense transform matrices (with the exception for diagonal overlap matrices, of course).

antoine-levitt commented 5 years ago

I mean computing the overlap as X'SX, factoring that, and using that transform to modify X. That's more efficient when the first dimension of X is large, esp. when S can be applied in a matrix-free way