JuliaNLSolvers / Optim.jl

Optimization functions for Julia
Other
1.11k stars 213 forks source link

exit early when nan in grad or hessian #1046

Closed mohamed82008 closed 11 months ago

mohamed82008 commented 11 months ago

This PR checks that the gradient and Hessian are all finite, otherwise it breaks early with a warning.

pkofod commented 11 months ago

I agree that this is a good idea because the current behavior will keep iterating until iterations number of iterations. I am wondering if it would make sense to simply restart the hessian approximation if the method is a quasi-newton instead of failing, but that's for another PR.

pkofod commented 11 months ago

Tests fail because TwiceDifferentiableHV does not have an H field, but you could check for NaNs in the Hessian-Vector product. I know it should be a different counter, but that can be another PR.

mohamed82008 commented 11 months ago

I excluded TwiceDifferentiableHV from the check.

codecov[bot] commented 11 months ago

Codecov Report

Merging #1046 (8757ec9) into master (fd93033) will increase coverage by 0.02%. The diff coverage is 100.00%.

:exclamation: Current head 8757ec9 differs from pull request most recent head 141aa1c. Consider uploading reports for the commit 141aa1c to get more accurate results

@@            Coverage Diff             @@
##           master    #1046      +/-   ##
==========================================
+ Coverage   85.26%   85.29%   +0.02%     
==========================================
  Files          43       43              
  Lines        3210     3216       +6     
==========================================
+ Hits         2737     2743       +6     
  Misses        473      473              
Files Changed Coverage Δ
src/multivariate/optimize/optimize.jl 91.42% <100.00%> (+0.80%) :arrow_up:
MilesCranmer commented 11 months ago

Is there a way to hide this warning? PySR and SymbolicRegression.jl have started generating TONS of these warnings during the expression search (which is fine; they will just skip such an expression).

MilesCranmer commented 11 months ago

Does this interfere with the behavior of Backtracking line searches? If so I think this is a breaking change and should be by default turned off, right? It seems to be messing with PySR & SymbolicRegression.jl's optimization loop

MilesCranmer commented 11 months ago

Yeah I think this is a breaking change. See LineSearches.BackTracking: https://github.com/JuliaNLSolvers/LineSearches.jl/blob/ded667a80f47886c77d67e8890f6adb127679ab4/src/backtracking.jl#L64

    # Hard-coded backtrack until we find a finite function value
    iterfinite = 0
    while !isfinite(ϕx_1) && iterfinite < iterfinitemax
        iterfinite += 1
        α_1 = α_2
        α_2 = α_1/2

        ϕx_1 = ϕ(α_2)
    end

^ i.e., it will backtrack when it hits a NaN. Whereas this PR makes it exit when it hits a NaN, which is fundamentally different behavior.

Could you please roll this change back and make it optional? It has changed the algorithmic behavior of PySR and SymbolicRegression.jl.

mohamed82008 commented 11 months ago

can you share an example that is broken now but used to work?

mohamed82008 commented 11 months ago

Gradient-based optimisers in Optim work as follows:

  1. Evaluate the objective, gradient and optionally Hessian at the current point x
  2. Find a search direction based on the gradient (and Hessian if available)
  3. Do a line search along the search direction and find the best solution on the line by objective value f(x)
  4. Move to the best solution on the line and go back to step 1

The check here is in step 1 checking if it results in NaN gradient/Hessian. For the current point in step 1 to have a NaN gradient, it must not have had a NaN objective because it was the output of the line search in step 3. None of the line search algorithms in step 3 will terminate at a point with a NaN objective if the initial point had a finite objective value but they can terminate at a point with NaN gradient and/or Hessian. So if the gradient/Hessian in step 1 has NaNs then the search direction will have NaNs and the line search will fail because all steps along a NaN search direction lead to a NaN x and then NaN objective value. So the rest of the iterations will all be spent failing line searches and doing unnecessary function evaluations.

Given the above reasoning, I am curious what exactly did this PR break?

MilesCranmer commented 11 months ago

Hm, I wonder if it was due to the special handling of NaNs in my package - I treat NaNs as a signal. I will have a look this weekend and see what the issue is. I fixed the version to 1.7.6 in SymbolicRegression.jl for now.

In any case, the warnings would be nice to turn off; is that possible?

mohamed82008 commented 11 months ago

It should be possible to turn the warnings off with a flag but that requires a new option in Optim so the decision will be Patrick's. It's also possible to turn off warnings from your end as a user.

MilesCranmer commented 10 months ago

Thanks @mohamed82008. I made a PR in #1049 to turn off the warning messages when show_trace = false. i.e., this avoids the need of making a new option by just using the verbosity option already present.

pkofod commented 9 months ago

Yeah I think this is a breaking change.

I think this is false. This check happens after the backtracking out of non-finite values loop.

MilesCranmer commented 9 months ago

Thanks. Could you please see #1049? That is blocking me from updating to the latest Optim.