Closed knuesel closed 6 months ago
I don't know, it's hard to do something in general here. If you think link(sigma) == 20 is too large, maybe the right thing to do is to apply bounds? I know it's hard to know a priori sometimes, but... It's also hard for Optim to know whether it actually jumped to a local minimum. You mention one iteration, but what about two then? Or if you started extremely close to the optimum then you'll end up doing two iterations instead of one... Functions with flat regions are hard for local optimizers. Maybe the appropriate thing would be to start with a few different starting values?
Isn't the solution here to just use LBFGS(initial_stepnorm = 0.01)
or whatever else is small enough?
Isn't the solution here to just use
LBFGS(initial_stepnorm = 0.01)
or whatever else is small enough?
Yes, but that option is only available for BFGS right now. But I agree as I hope was clear above: I don't think a heuristic will be easy to find or generally applicable. A local gradient based method will just have a hard time with functions shaped like the one above sometimes. There's a reason why so many of the canonical optimization test sets are valleys or regions with steep edges combined with almost flat regions.
Yes in practice I solve it with a small alphaguess
and trying different starting values. I had the vague idea that "convergence in one iteration" might be a good heuristic to detect when the initial step is too large, but I don't have much perspective on this. Feel free to close the issue.
I got a false convergence result minimizing
-logpdf(Normal(150, σ) , 147.9)
with L-BFGS . My σ is defined on [0, 50], so Turing.jl maps it to (-∞, ∞) using a logit/sigmoid transformation through Bijectors.jl. See the initial issue file at https://github.com/TuringLang/Turing.jl/issues/1597 for details.Here is a reduced example with σ in [0, 5]:
Output:
The red markers show the initial value and first (=last) iteration. As you can see, the large initial gradient (-24.4) sends the first iteration far in the flat region on the right, where the
|g(x)|
condition is immediately satisfied.Of course the user can fix it by using a better initial value, or decreasing the step-size (e.g. with
alphaguess=0.01
). However it was not obvious in my case as the coordinate transformation and algorithm selection was made automatically behind the scenes. So I guess it would be nice to make the algorithm more robust to this issue... Here's an idea:It seems unlikely that the optimization succeeds in one step? So if that happens, Optim could try again with
alphaguess
divided by 10 (repeating a few times before giving up).Or do you think such logic should rather be added to downstream packages like Turing.jl?