Open AndiMD opened 3 years ago
It seems that the finite difference gradient is calculated with a too large finite step size, while ForwardDiff
fails with ambiguity errors. (in contrast to Zygote
, which handles this with no issues).
Maybe we can automatically scale back the finite difference step size if gradients become NaN or +-infinity, or allow a autodiff=:zygote mode?
what's the forwarddiff error? Zygote support could be added. This would be over at NLSolversBase where we handle the AD-stuff.
Glad to hear that. The autodiff=:forward error is as follows (Julia 1.5.3, ForwardDiff v0.10.16):
res1 = optimize( p->sum((model1(xdata,p).-ydata).^2), [1E-9,4E-10], BFGS(), Optim.Options(g_tol = 1e-18),autodiff=:forward)
ERROR: MethodError: ForwardDiff.Dual{ForwardDiff.Tag{var"#3#4",Float64},Float64,2}(::Base.TwicePrecision{ForwardDiff.Dual{ForwardDiff.Tag{var"#3#4",Float64},Float64,2}}) is ambiguous. Candidates:
(::Type{T})(x::Base.TwicePrecision) where T<:Number in Base at twiceprecision.jl:243
(::Type{ForwardDiff.Dual{T,V,N}})(x) where {T, V, N} in ForwardDiff at /home/andi/.julia/packages/ForwardDiff/m7cm5/src/dual.jl:72
Possible fix, define
(::Type{ForwardDiff.Dual{T,V,N}})(::Base.TwicePrecision) where {T, V, N}
Stacktrace:
[1] broadcasted(::Base.Broadcast.DefaultArrayStyle{1}, ::typeof(/), ::StepRangeLen{ForwardDiff.Dual{ForwardDiff.Tag{var"#3#4",Float64},Float64,2},Base.TwicePrecision{ForwardDiff.Dual{ForwardDiff.Tag{var"#3#4",Float64},Float64,2}},Base.TwicePrecision{ForwardDiff.Dual{ForwardDiff.Tag{var"#3#4",Float64},Float64,2}}}, ::Float64) at ./broadcast.jl:1094
[2] broadcasted at ./broadcast.jl:1263 [inlined]
[3] model1(::StepRangeLen{Float64,Base.TwicePrecision{Float64},Base.TwicePrecision{Float64}}, ::Array{ForwardDiff.Dual{ForwardDiff.Tag{var"#3#4",Float64},Float64,2},1}) at ./REPL[2]:1
[4] (::var"#3#4")(::Array{ForwardDiff.Dual{ForwardDiff.Tag{var"#3#4",Float64},Float64,2},1}) at ./REPL[6]:1
[5] vector_mode_dual_eval at /home/andi/.julia/packages/ForwardDiff/m7cm5/src/apiutils.jl:37 [inlined]
[6] vector_mode_gradient!(::DiffResults.MutableDiffResult{1,Float64,Tuple{Array{Float64,1}}}, ::var"#3#4", ::Array{Float64,1}, ::ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#3#4",Float64},Float64,2,Array{ForwardDiff.Dual{ForwardDiff.Tag{var"#3#4",Float64},Float64,2},1}}) at /home/andi/.julia/packages/ForwardDiff/m7cm5/src/gradient.jl:113
[7] gradient! at /home/andi/.julia/packages/ForwardDiff/m7cm5/src/gradient.jl:37 [inlined]
[8] gradient! at /home/andi/.julia/packages/ForwardDiff/m7cm5/src/gradient.jl:35 [inlined]
[9] (::NLSolversBase.var"#14#18"{Float64,var"#3#4",ForwardDiff.GradientConfig{ForwardDiff.Tag{var"#3#4",Float64},Float64,2,Array{ForwardDiff.Dual{ForwardDiff.Tag{var"#3#4",Float64},Float64,2},1}}})(::Array{Float64,1}, ::Array{Float64,1}) at /home/andi/.julia/packages/NLSolversBase/QPnui/src/objective_types/oncedifferentiable.jl:70
[10] value_gradient!!(::OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}) at /home/andi/.julia/packages/NLSolversBase/QPnui/src/interface.jl:82
[11] initial_state(::BFGS{LineSearches.InitialStatic{Float64},LineSearches.HagerZhang{Float64,Base.RefValue{Bool}},Nothing,Nothing,Flat}, ::Optim.Options{Float64,Nothing}, ::OnceDifferentiable{Float64,Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}) at /home/andi/.julia/packages/Optim/onG5j/src/multivariate/solvers/first_order/bfgs.jl:85
[12] optimize at /home/andi/.julia/packages/Optim/onG5j/src/multivariate/optimize/optimize.jl:35 [inlined]
[13] #optimize#87 at /home/andi/.julia/packages/Optim/onG5j/src/multivariate/optimize/interface.jl:142 [inlined]
[14] top-level scope at REPL[6]:1
the code of the problem worked without problem on Optim 1.2.4, ForwardDiff v0.10.16, julia 1.5.3
@longemen3000 thanks for testing! I clarified the bugreport: The code runs and reports success, but does not perform optimization.
Possibly the same issue as here: https://github.com/JuliaNLSolvers/LsqFit.jl/issues/178
MWE, fit an exponential function (adapted from the LsqFit example):
EDIT: Clarification: The code runs without error, but no optimization is performed in case of the unscaled parameters, probably because the finite difference step size is way too large, leading to NaN or Inf.