JuliaNLSolvers / LineSearches.jl

Line search methods for optimization and root-finding
Other
120 stars 34 forks source link

Hager-Zhang converges incorrectly within Optim.ConjugateGradient due to flatness check #175

Open kbarros opened 5 months ago

kbarros commented 5 months ago

I found a case where the ConjugateGradient method of Optim.jl terminates early, at a non-stationary point, without raising an error. After some digging, the problem seems associated with LineSearches. I can't tell whether the actual bug is in in LineSearches, or in the way that Optim calls the Hager-Zhang line search. Your help in diagnosing this would be greatly appreciated.

Below is a simplified example illustrating how Hager-Zhang fails to converge correctly, given the (effective) parameters that are provided by Optim.jl:


using LineSearches

ϕ(c) = 3.042968312396456 - 832.4270136930788*c - 132807.15591801773*c^2 + 7.915421661743959e6*c^3 - 1.570284840040962e8*c^4 + 1.4221708747294645e9*c^5 - 5.970065591205604e9*c^6 + 9.405512899903618e9*c^7
dϕ(c) = -832.4270136930788 - 265614.31183603546*c + 2.3746264985231876e7*c^2 - 6.281139360163848e8*c^3 + 7.110854373647323e9*c^4 - 3.582039354723362e10*c^5 + 6.5838590299325325e10*c^6

function ϕdϕ(c)
    println("ϕ($c)=$(ϕ(c)), dϕ($c)=$(dϕ(c))")
    (ϕ(c), dϕ(c))
end

c0 = 0.2
ϕ0, dϕ0 = ϕdϕ(0)
println("  ^ Is it valid to evaluate these away from c0=0.2?")

ls = HagerZhang()
res = ls(ϕ, dϕ, ϕdϕ, c0, ϕ0, dϕ0)

The printed output of this code is:

ϕ(0)=3.042968312396456, dϕ(0)=-832.4270136930788
  ^ Is it valid to evaluate these away from c0=0.2?
ϕ(0.2)=3.117411287254072, dϕ(0.2)=-505.33622492291033
ϕ(0.1)=-3.503584823341612, dϕ(0.1)=674.947830358913
ϕ(0.055223623837016025)=0.5244246783084083, dϕ(0.055223623837016025)=738.3388472434362

Note that the final value c=0.055... is not a zero point of the derivative .

The HZ line search terminates early because this "flatness" condition is hit: https://github.com/JuliaNLSolvers/LineSearches.jl/blob/master/src/hagerzhang.jl#L282-L285

However, it's not clear to me whether the underlying problem is the HZ flatness condition, or the way that the call to method.linesearch! is made in Optim: https://github.com/JuliaNLSolvers/Optim.jl/blob/master/src/utilities/perform_linesearch.jl#L41-L60).

Is it OK that the values of phi_0, dphi_0 were calculated for alpha == 0 and not state.alpha == 0.2?

To fix the bug, therefore, it seems that there are two possibilities:

  1. The HZ flatness condition needs to be modified.
  2. In Optim, prior to calling method.linesearch!, the values of phi_0, dphi_0 should be recalculated for the newly guessed state.alpha.

Thanks in advance.

pkofod commented 5 months ago

See also https://github.com/JuliaNLSolvers/LineSearches.jl/pull/174

cc @mateuszbaran seems like it's another problem related to the flatness detection

kbarros commented 5 months ago

Thanks, this looks very similar, I will move the discussion there.

timholy commented 1 week ago

Note that the final value c=0.055... is not a zero point of the derivative dϕ.

It's not expected to be. Essentially all modern optimizers perform approximate line search. See, e.g., the Wolfe conditions.

That said, returning α = 0 is not OK.