Open RainerHeintzmann opened 3 years ago
It is nice to have some feedback on VMLMB.
From what you write, I am assuming that, to ensure the positivity of the variables, say x
, you optimize your objective function via auxiliary variables, say u
, such that x[i] = u[i]^2
(for all indices i
). Then using auxiliary variables is not always a good choice because it makes your problem non-convex whatever the original objective function. At the bounds (where u[i]=0
), the gradient of the objective function with respect the auxiliary variables is exactly equal to zero which may prevent moving away from the bounds. If you do not initialize your variables with zero, it is numerically unlikely to occur, but having a non-convex and more non-linear objective function is generally not harmless for optimization. For me, it is thus rather fortunate that the convergence be improved by the change of variables. This is certainly problem dependent. I'm not questioning your results but it's funny that I wrote VMLM-B (a long time ago) precisely to avoid using the nonlinear change of variables to impose the positivity of the variables ...
I have a few questions/remarks/suggestions:
My experience is that a weird convergence hehavior may be a hint that the computed gradient of the objective function is wrong. It is better to double check ;-)
In your tests, did you try to push the convergence very far?
Unless it is top secret, how many variables do you have and what is the kind of objective function you want to minimize?
The number of objective function evaluations is usually more critcal than the number of iterations because the cost of evaluating the objective function (and its gradient) usually dominates the computational burden.
20 memorized previous iterations is quite large for L-BFGS. Notably considering that convergence may only takes 16 iterations.
I do not know if L-BFGS-B ('B' is for bounded) is part of Optim.jl
but you may give it a try for bound constraints. VMLM-B is comparable to L-BFGS-B in terms of number of iterations. VMLM-B has less overheads than L-BFGS-B so it may be a bit faster.
Thanks a lot for your detailed explanation! Yes, your assumptions are all correct. In the square case, we do not really care which solution it finally finds, but of course you are right with the potential zero-gradient in between. However, it is unstable, so practically it does not matter as long as we do not initialize with zero. Looking at a range of different auxiliary functions (which I call "pre-forward-model"), one sees also a range of different convergence rates. You are right and I should double-check the gradients to be correct, yet I observe faster (not slower) convergence compared to the version without auxiliary functions, which indicates that the gradient is probably OK.
The problem is not top secret at all. Its all in DeconvOptim.jl
. The number of variables are in the millions and the objective function is for example the i-divergence of a linear forward problem (convolution of unknown object with known point spread function).
I think for proper convergence, we often need hundreds of iterations, no matter whether its L-BFGS or VMLB.
I have to double-check, but I think that L-BFGS-B is not in Optim.jl.
I played around a bit to test the
vmlmb
method forDeconvOptim.jl
. It worked nicely, but I did not really find any advantage compared to the standard LBFGS implementation provided byOpim.jl
. Interestingly we implemented the positivity constraint using a square operation as an auxiliary function rather than bounding the optimizer. If I compare the two approaches, the standard LBFGS with the square auxiliary function converges much quicker thanvmlmb
together with "mem=20, lower=0". E.g. after 16 iterations LBFGS has already a lower loss value thanvmlmb
with the bound has after 26 iterations. Yet if I test it with the same auxiliary square function the performance is quite similar. Any idea how this could be improved, or is the auxiliary function always a better choice?