Open antoine-levitt opened 7 years ago
Any chance of support for positive semi-definite matrices?
Any chance of support for positive semi-definite matrices?
According to the docs Manopt has this, but their code is GPL-3 licensed, so I think it's best if we avoid links to their code base in this thread, as it will probably exclude the reader from contributing.
Coincidentally, @whuang08 pointed to me just yesterday that a retraction for the manifold of SPD matrices is R_eta(X) = X + eta + 0.5 eta X^{-1} eta, which breaks the current API (which only has a retract(x) function whereas this would need a retract(x,eta)). This should not be too bad to adapt though. That retraction seems fishy to me because it's singular at zero eigenvalues, but apparently it's useful. I don't have any experience in SPD matrices, do you have a use case/paper/reference implementation?
Ah, good point about the licence, will avoid looking at ROPTLIB also then. It is however described in papers.
It probably doesn't answer your question directly (it's been a while since I've worked with manifolds) but the easiest option is usually parametrise SPD matrices via a Cholesky factor which is constrained to have non-negative diagonals.
That looks substantially different from using a manifold type algorithm, and is probably best treated by a solver with inequality constrains (JuMP & friends).
Ah, okay. Thanks.
Hmm, my problem is that I have quite a complicated objective function, which doesn't seem to be supported by JuMP. I guess I can write the reparametrisation myself, but it would be nice if there were some tools for this.
Two references for SPD matrices are: https://www.math.fsu.edu/~whuang2/papers/RMKMSPDM.htm and https://arxiv.org/abs/1507.02772 Of course, there are more references since computation on SPD matrices is an active topic as far as I know. (Many people discussed about this when I was in a linear algebra conference.)
Best,
Wen
On Mon, Sep 25, 2017 at 3:37 PM, Simon Byrne notifications@github.com wrote:
Hmm, my problem is that I have quite a complicated objective function, which doesn't seem to be supported by JuMP. I guess I can write the reparametrisation myself, but it would be nice if there were some tools for this.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/JuliaNLSolvers/Optim.jl/issues/448#issuecomment-332004721, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ5gb8_vJZE1455Udt7MXxysqYnGfM4gks5smA8LgaJpZM4Ogbic .
I guess it depends on whether you expect your minimum to hit the boundary (matrices with zero eigenvalues) or not. If you do then I would imagine it's hard to bypass an inequality constrained solver. If you don't then I guess manifold algorithms are efficient.
Just putting this here for reference: I benchmarked finding the first few eigenvalues of a large random matrix in Optim and in manopt. The results is that the implementations are comparable and result in about the same number of iterations/matvec. Code for manopt:
% Generate random problem data.
% rng(0);
n = 1000;
m = 10;
A = randn(n);
A = .5*(A+A.');
x0 = randn(n,m);
% Create the problem structure.
manifold = stiefelfactory(n,m);
problem.M = manifold;
% Define the problem cost function and its Euclidean gradient.
problem.cost = @(x) -trace(x'*(A*x));
problem.egrad = @(x) -2*A*x;
% Solve.
[x, xcost, info, options] = rlbfgs(problem, x0);
% Display some statistics.
figure;
semilogy([info.iter], [info.gradnorm], '.-');
xlabel('Iteration number');
ylabel('Norm of the gradient of f');
and Optim:
using Optim
# srand(0)
const n = 1000
const m = 10
M = randn(n,n)
const A = (M+M')/2
f(x) = -vecdot(x,A*x)
g(x) = -2*A*x
g!(stor,x) = copy!(stor,g(x))
x0 = randn(n,m)
manif = Optim.Stiefel()
Optim.optimize(f, g!, x0, Optim.LBFGS(m=30,manifold=manif, linesearch=Optim.HagerZhang()), Optim.Options(show_trace=true, allow_f_increases=true,g_tol=1e-6))
Cool, good to have some examples like this.
I tried swapping out HagerZhang for MoreThuente. The former gives
Results of Optimization Algorithm
* Algorithm: L-BFGS
* Starting Point: [0.15317647539137902,0.42376052107832024, ...]
* Minimizer: [-0.04295817425645339,-0.018684703787118326, ...]
* Minimum: -4.311236e+02
* Iterations: 136
* Convergence: true
* |x - x'| ≤ 0.0e+00: false
|x - x'| = 6.80e-08
* |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
|f(x) - f(x')| = -1.45e-14 |f(x)|
* |g(x)| ≤ 1.0e-06: true
|g(x)| = 9.52e-07
* Stopped by an increasing objective: false
* Reached Maximum Number of Iterations: false
* Objective Calls: 410
* Gradient Calls: 411
while MoreThuente gives
Results of Optimization Algorithm
* Algorithm: L-BFGS
* Starting Point: [0.05351779586314921,-0.7229077835364466, ...]
* Minimizer: [-0.04590592560787643,-0.03998550452894138, ...]
* Minimum: -4.301612e+02
* Iterations: 137
* Convergence: true
* |x - x'| ≤ 0.0e+00: false
|x - x'| = 6.96e-08
* |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: false
|f(x) - f(x')| = -2.51e-14 |f(x)|
* |g(x)| ≤ 1.0e-06: true
|g(x)| = 8.44e-07
* Stopped by an increasing objective: false
* Reached Maximum Number of Iterations: false
* Objective Calls: 147
* Gradient Calls: 148
Interesting to see the significant reduction in objective calls here. I know this is know - HZ needs to be conservative, but if MT is suited for the example at hand, there's a pretty neat gain to find in terms of calls to f(x)/g(x).
(using A_mul_B!
in g! can also decrease runtime by 25%, but that obviously has nothing to do with how well the optimizer works :))
I tried swapping out HagerZhang for MoreThuente.
Hmm, and BackTracking stagnates
* Algorithm: L-BFGS
* Starting Point: [0.5494801640402754,1.2759422464582777, ...]
* Minimizer: [-0.003418669419144482,0.00966606972837302, ...]
* Minimum: -3.174551e+02
* Iterations: 8
* Convergence: true
* |x - x'| ≤ 0.0e+00: false
|x - x'| = 1.05e-09
* |f(x) - f(x')| ≤ 0.0e+00 |f(x)|: true
|f(x) - f(x')| = 0.00e+00 |f(x)|
* |g(x)| ≤ 1.0e-06: false
|g(x)| = 8.45e+00
* Stopped by an increasing objective: false
* Reached Maximum Number of Iterations: false
* Objective Calls: 52
* Gradient Calls: 10
@anriseth yes, I reported this on gitter. This is strange, because I used backtracking on similar problems before and it worked very well. I reduced it further here: https://github.com/JuliaNLSolvers/Optim.jl/issues/626
@pkofod yes, LBFGS usually produces pretty good initial steps, so HZ is counterproductive in requiring more evals per iteration. In my previous tests backtracking was a much better default choice; @anriseth suggested to make it the default for LBFGS and I agree: it should really only be the default for CG, where it is more important to have a good stepsize.
I am particularly interested in the last two items in the list. For the custom metric, I implemented the Löwdin transform that symmetrically orthogonalizes a non-orthogonal basis, such that the SVD:s &c that assumes an orthogonal basis works. The results are then transformed back to the non-orthogonal basis.
Please see my implementation here: https://github.com/JuliaAtoms/SCF.jl/blob/b25be8df29efb21ce573a4b9041709ee78573353/src/manifolds.jl I am well aware that it is not particularly efficient and there may be a mathematically more beautiful approach, but it works (TM).
Would there be any interest in including these extended manifolds into Optim.jl? Any improvement suggestions?
For sure, please do put it in there. The longer term plan is to split those things off in Manifolds.jl,see https://github.com/JuliaNLSolvers/Manifolds.jl/issues/35, but right now it can live here.
I think you can avoid doing the lowdin transform explicitly, and just eigendecompose overlap matrices. Will take a closer look when I'm at a computer
Well, I implemented the Löwdin transform by eigendecomposing the overlap matrix (when it is not simply an UniformScaling
) and caching the forward and backward transforms. I was more thinking if one could implement something via SVD, but it will anyway end up being dense transform matrices (with the exception for diagonal overlap matrices, of course).
I mean computing the overlap as X'SX, factoring that, and using that transform to modify X. That's more efficient when the first dimension of X is large, esp. when S can be applied in a matrix-free way
See https://github.com/JuliaNLSolvers/Optim.jl/pull/435