JuliaNLSolvers / Optim.jl

Optimization functions for Julia
Other
1.1k stars 214 forks source link

meaning of f_calls_limit is misleading #1053

Open MariusDrulea opened 9 months ago

MariusDrulea commented 9 months ago

See the following code and the output. We count the number of calls made to the rosenbrock function. As you can see in the output, the optimize method reports only 53 calls of the function. However there are count_calls=[256], way above the f_calls_limit=100, so clearly not all function calls are reported by optimize. Perhaps optimize only counts the direct calls to the function and ignore the calls required to compute the gradient and the hessian by finite differences. In my opinion, f_calls_limit shall refer to all possible calls of the original function. Otherwise we just get a hidden behavior.

using Optim

count_calls = [0]
function rosenbrock(x)
    count_calls[1] += 1 
    return (1.0 - x[1])^2 + 100.0 * (x[2] - x[1]^2)^2
end
result = optimize(rosenbrock, zeros(2), BFGS(), Optim.Options(x_tol=1e-12, g_tol = 1e-08, f_calls_limit=100))
@show count_calls
rosenbrock (generic function with 1 method)

 * Status: success

 * Candidate solution
    Final objective value:     5.471433e-17

 * Found with
    Algorithm:     BFGS

 * Convergence measures
    |x - x'|               = 3.47e-07 ≰ 1.0e-12
    |x - x'|/|x'|          = 3.47e-07 ≰ 0.0e+00
    |f(x) - f(x')|         = 6.59e-14 ≰ 0.0e+00
    |f(x) - f(x')|/|f(x')| = 1.20e+03 ≰ 0.0e+00
    |g(x)|                 = 2.33e-09 ≤ 1.0e-08

 * Work counters
    Seconds run:   0  (vs limit Inf)
    Iterations:    16
    f(x) calls:    53
    ∇f(x) calls:   53

count_calls = [265]
MariusDrulea commented 9 months ago

Something I just remarked:

f(x) calls: 53
 ∇f(x) calls:   53 
    => there are 2 parameters; for each parameters we need 2 evaluations of f for gradient  
    => 2 params * 2 f(x) calls/param * 53 calls of gradient = 4*53

So in total we have 53+4*53=256, which is exactly the number of calls we measured above.

This opens up another possibility: Can we specify what type of finite differences to use? Like use forward differences (f(x+e)-f(x))/e instead of central differences (f(x+e)-f(x-e))/2e. If there are multiple parameters (multiple direction vectors e), f(x) stays the same in forward differences and we save a lot of function calls. My real-world function is expensive to calculate.

Edit: While I was looking over the Optim.jl and NLSolversBase.jl code I have found the autodiff=:finiteforward option: result = optimize(rosenbrock, zeros(2), BFGS(), opts, autodiff=:finiteforward). Unfortunately :finiteforward behaves just like :finitecentral and does not reduce the number of function calls. So I believe in the current implementation f(x) is un-necessarily computed multiple times. This can be optimized.

johnmyleswhite commented 9 months ago

In my opinion, f_calls_limit shall refer to all possible calls of the original function. Otherwise we just get a hidden behavior.

This is a very reasonable thing to want, but it's not clear it's technically feasible. Suppose you defined rosenbrock the way you did and then also defined rosenbrock_gradient by calling into your rosenbrock. How is Optim supposed to know this and produce counts consistent with your internal counter?

It is true that in the special case when Optim has created the gradient itself that Optim could know how many calls are being made, but is it better for the interpretation of f_calls_limit to vary depending on whether Optim creates gradients rather than takes them in as blackbox inputs?

More generally, I think the right way to optimize this stuff is for the caller to do the optimization rather than for Optim to be more clever -- this is why the Optim interface takes in things like fg! so you can compute many things in a single call.

MariusDrulea commented 8 months ago

but is it better for the interpretation of f_calls_limit to vary depending on whether Optim creates gradients rather than takes them in as blackbox inputs?

I think so, why not? If you provide the gradient, it is expected that the number of calls to the underlying function would be a lot smaller. It's another way to sense the usefulness of providing a gradient function.

pkofod commented 8 months ago

I think this has been discussed quite a few times. You can argue either way. But you're right, it's not necessarily counting the number of times rosenbrock is called.

MariusDrulea commented 8 months ago

I think perhaps a better description of what that is (objective calls) and what is not (does not count objectives in gradient), would be sufficient: https://julianlsolvers.github.io/Optim.jl/stable/#user/config/#general-options.

MariusDrulea commented 8 months ago

I have added a better description in this https://github.com/JuliaNLSolvers/Optim.jl/pull/1054. The PR also shows the minimizer at the end of minimization. Let me know if this is also appropriate.

pkofod commented 6 months ago

I think they are two separate issues. I'm on board with merging your change to the description, but I'm not sure about showing the full minimizer.