Closed ForceBru closed 1 year ago
Hi @ForceBru ! There is currently a number of changes made to this package to make it efficient, and the doc didn't follow yet.
The backend are now split by operations:
- `gradient_backend = ForwardDiffADGradient`;
- `hprod_backend = ForwardDiffADHvprod`;
- `jprod_backend = ForwardDiffADJprod`;
- `jtprod_backend = ForwardDiffADJtprod`;
- `jacobian_backend = SparseForwardADJacobian`;
- `hessian_backend = ForwardDiffADHessian`;
- `ghjvprod_backend = ForwardDiffADGHjvprod`;
- `hprod_residual_backend = ForwardDiffADHvprod` for `ADNLSModel` and `EmptyADbackend` otherwise;
- `jprod_residual_backend = ForwardDiffADJprod` for `ADNLSModel` and `EmptyADbackend` otherwise;
- `jtprod_residual_backend = ForwardDiffADJtprod` for `ADNLSModel` and `EmptyADbackend` otherwise;
- `jacobian_residual_backend = SparseForwardADJacobian` for `ADNLSModel` and `EmptyADbackend` otherwise;
- `hessian_residual_backend = ForwardDiffADHessian` for `ADNLSModel` and `EmptyADbackend` otherwise.
that can be modified by kwargs...
.
To compute the gradient with Zygote
you can do:
using Zygote
ADNLPModel(f, x0; gradient_backend = ADNLPModels.ZygoteADGradient)
I found that runtests.jl
defines ZygoteAD
like this:
However, shouldn't this be accessible in the actual package as well?
As a remainder, I opened an issue about this https://github.com/JuliaSmoothOptimizers/ADNLPModels.jl/issues/135
I found that
runtests.jl
definesZygoteAD
like this:However, shouldn't this be accessible in the actual package as well?
It has been temporarily removed as it didn't seem to be a good idea to use Zygote
for all the operations in ADNLPModels.jl
Nice, I got it to work, thanks!
However, it seems to be using ForwardDiff sometimes. My objective function looks like this:
loss(params::AV{<:Real}) = begin
print(params, '\n')
# ...
end
The NLP is defined like this:
nlp = ADNLPModels.ADNLPModel(
loss, par0, zero(par0), fill(Inf, length(par0)),
par->begin
# Some constraints...
m = reconstruct(par) # I'm trying to fit a Flux neural network using IPOPT, yeah...
@. m.cell.λ - 1 / (1 + m.cell.γ)
end, [-Inf], [0.0],
gradient_backend=ADNLPModels.ZygoteADGradient,
jacobian_backend=ADNLPModels.ZygoteADJacobian,
jprod_backend=ADNLPModels.ZygoteADJprod,
jtprod_backend=ADNLPModels.ZygoteADJtprod,
hessian_backend=ADNLPModels.ZygoteADHessian
)
I then try to optimize it with NLPModelsIpopt.ipopt
:
sol = mktemp() do output_file, _
NLPModelsIpopt.ipopt(nlp; print_level=0, output_file)
end
Some of the output looks like this:
[0.1, 0.1, 1.0] -1187.7802021381242
[0.1, 0.1, 1.0] -1205.0964896879434
[0.1, 0.1, 1.0] -1205.0964896879434
ForwardDiff.Dual{ForwardDiff.Tag{ADNLPModels.var"#349#378"{ADNLPModels.var"#ℓ#319"{Float64, ADNLPModels.var"#ℓ#318#320"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}, ADNLPModels.var"#c#362"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}, ADNLPModels.var"#c#332#363"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}}}}, SubArray{Float64, 1, Vector{Float64}, Tuple{UnitRange{Int64}}, true}}}, Float64}, Float64, 3}[Dual{ForwardDiff.Tag{ADNLPModels.var"#349#378"{ADNLPModels.var"#ℓ#319"{Float64, ADNLPModels.var"#ℓ#318#320"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}, ADNLPModels.var"#c#362"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}, ADNLPModels.var"#c#332#363"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}}}}, SubArray{Float64, 1, Vector{Float64}, Tuple{UnitRange{Int64}}, true}}}, Float64}}(0.1,1.0,0.0,0.0), Dual{ForwardDiff.Tag{ADNLPModels.var"#349#378"{ADNLPModels.var"#ℓ#319"{Float64, ADNLPModels.var"#ℓ#318#320"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}, ADNLPModels.var"#c#362"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}, ADNLPModels.var"#c#332#363"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}}}}, SubArray{Float64, 1, Vector{Float64}, Tuple{UnitRange{Int64}}, true}}}, Float64}}(0.1,0.0,1.0,0.0), Dual{ForwardDiff.Tag{ADNLPModels.var"#349#378"{ADNLPModels.var"#ℓ#319"{Float64, ADNLPModels.var"#ℓ#318#320"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}, ADNLPModels.var"#c#362"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}, ADNLPModels.var"#c#332#363"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}}}}, SubArray{Float64, 1, Vector{Float64}, Tuple{UnitRange{Int64}}, true}}}, Float64}}(1.0,0.0,0.0,1.0)] -1205.0964896879434
[0.1139774587967983, 0.117847948067879, 1.019999799999998] -1205.0964896879434
[0.1139774587967983, 0.117847948067879, 1.019999799999998] -1205.0964896879434
ForwardDiff.Dual{ForwardDiff.Tag{ADNLPModels.var"#349#378"{ADNLPModels.var"#ℓ#319"{Float64, ADNLPModels.var"#ℓ#318#320"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}, ADNLPModels.var"#c#362"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}, ADNLPModels.var"#c#332#363"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}}}}, SubArray{Float64, 1, Vector{Float64}, Tuple{UnitRange{Int64}}, true}}}, Float64}, Float64, 3}[Dual{ForwardDiff.Tag{ADNLPModels.var"#349#378"{ADNLPModels.var"#ℓ#319"{Float64, ADNLPModels.var"#ℓ#318#320"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}, ADNLPModels.var"#c#362"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}, ADNLPModels.var"#c#332#363"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}}}}, SubArray{Float64, 1, Vector{Float64}, Tuple{UnitRange{Int64}}, true}}}, Float64}}(0.1139774587967983,1.0,0.0,0.0), Dual{ForwardDiff.Tag{ADNLPModels.var"#349#378"{ADNLPModels.var"#ℓ#319"{Float64, ADNLPModels.var"#ℓ#318#320"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}, ADNLPModels.var"#c#362"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}, ADNLPModels.var"#c#332#363"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}}}}, SubArray{Float64, 1, Vector{Float64}, Tuple{UnitRange{Int64}}, true}}}, Float64}}(0.117847948067879,0.0,1.0,0.0), Dual{ForwardDiff.Tag{ADNLPModels.var"#349#378"{ADNLPModels.var"#ℓ#319"{Float64, ADNLPModels.var"#ℓ#318#320"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}, ADNLPModels.var"#c#362"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}, ADNLPModels.var"#c#332#363"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}}}}, SubArray{Float64, 1, Vector{Float64}, Tuple{UnitRange{Int64}}, true}}}, Float64}}(1.019999799999998,0.0,0.0,1.0)] -1205.0964896879434
[0.13584624216475483, 0.20051838032340225, 1.3028396274794867] -1205.0964896879434
[0.13584624216475483, 0.20051838032340225, 1.3028396274794867] -1205.0964896879434
Where do the ForwardDiff.Dual
come from? I specifically requested Zygote
everywhere...
print(nlp)
after optimization:
ADNLPModel - Model with automatic differentiation backend ADModelBackend{
ZygoteADGradient,
ForwardDiffADHvprod,
ZygoteADJprod,
ZygoteADJtprod,
ZygoteADJacobian,
ZygoteADHessian,
ForwardDiffADGHjvprod,
}
Problem name: Generic
All variables: ████████████████████ 3 All constraints: ████████████████████ 1
free: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 free: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
lower: ████████████████████ 3 lower: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
upper: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 upper: ████████████████████ 1
low/upp: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 low/upp: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
fixed: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 fixed: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
infeas: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 infeas: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
nnzh: ( 0.00% sparsity) 6 linear: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
nonlinear: ████████████████████ 1
nnzj: ( 0.00% sparsity) 3
Counters:
obj: ██████████████████⋅⋅ 9 grad: ████████████████████ 10 cons: ██████████████████⋅⋅ 9
cons_lin: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 cons_nln: ██████████████████⋅⋅ 9 jcon: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
jgrad: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 jac: ████████████████████ 10 jac_lin: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
jac_nln: ████████████████████ 10 jprod: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 jprod_lin: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
jprod_nln: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 jtprod: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 jtprod_lin: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
jtprod_nln: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 hess: ████████████████⋅⋅⋅⋅ 8 hprod: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
jhess: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0 jhprod: ⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅ 0
So it still uses ForwardDiffADHvprod
and ForwardDiffADGHjvprod
, but 1) there doesn't seem to be a Zygote alternative to this; and 2) "Counters" ending in prod
are all zero, so ...Hvprod
and ...Hjvprod
probably weren't used, or were they?
Is it possible to fully switch to Zygote, in such a way that I never get ForwardDiff.Dual
as arguments?
EDIT: I fixed a bug in the loss function, and now I'm getting this error:
MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{ADNLPModels.var"#349#378"{ADNLPModels.var"#ℓ#319"{Float64, ADNLPModels.var"#ℓ#318#320"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}, ADNLPModels.var"#c#362"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}, ADNLPModels.var"#c#332#363"{ADNLPModels.ADNLPModel{Float64, Vector{Float64}, Vector{Int64}}}}}, SubArray{Float64, 1, Vector{Float64}, Tuple{UnitRange{Int64}}, true}}}, Float64}, Float64, 3})
It makes sense because Flux uses basic Float
s everywhere, and it works fine with Zygote. But here the ADNLPModel
tries to use ForwardDiff
, even though I set the options to use Zygote.
I am suspecting it comes from the initialization of the backend. As you accurately noticed, the function hprod
and ghjvprod
are not used, so you could do:
hprod_backend = ADNLPModels.EmptyADbackend()
ghjvprod_backend = ADNLPModels.EmptyADbackend()
in the kwargs of your ADNLPModel.
An alternative explanation for the ForwardDiff
tags may also come from the Hessian, cf.
https://github.com/JuliaSmoothOptimizers/ADNLPModels.jl/blob/9f19ff71d55749c3ba24f7e14691d10326bdcf8a/src/zygote.jl#L78
Zygote doesn't seem to compute an Hessian, but uses ForwardDiff over its gradient, cf. https://github.com/FluxML/Zygote.jl/blob/31811c3f909c82e20d0b4b0d39411750eec9d6ba/src/lib/grad.jl#L62
I am suspecting it comes from the initialization of the backend. As you accurately noticed, the function
hprod
andghjvprod
are not used, so you could do:hprod_backend = ADNLPModels.EmptyADbackend() ghjvprod_backend = ADNLPModels.EmptyADbackend()
in the kwargs of your ADNLPModel.
I did this:
nlp = ADNLPModels.ADNLPModel(
loss_1arg, par0, zero(par0), fill(Inf, length(par0)),
par->begin
m = reconstruct(par)
@. m.cell.λ - 1 / (1 + m.cell.γ)
end, [-Inf], [0.0],
gradient_backend=ADNLPModels.ZygoteADGradient,
jacobian_backend=ADNLPModels.ZygoteADJacobian,
jprod_backend=ADNLPModels.ZygoteADJprod,
jtprod_backend=ADNLPModels.ZygoteADJtprod,
hessian_backend=ADNLPModels.ZygoteADHessian,
hprod_backend = ADNLPModels.EmptyADbackend(),
ghjvprod_backend = ADNLPModels.EmptyADbackend()
)
But now it says:
MethodError: no method matching var"#ADModelBackend#1"(::Type{ADNLPModels.ZygoteADGradient}, ::ADNLPModels.EmptyADbackend, ::Type{ADNLPModels.ZygoteADJprod}, ::Type{ADNLPModels.ZygoteADJtprod}, ::Type{ADNLPModels.ZygoteADJacobian}, ::Type{ADNLPModels.ZygoteADHessian}, ::ADNLPModels.EmptyADbackend, ::Base.Pairs{Symbol, Vector{Float64}, Tuple{Symbol}, NamedTuple{(:x0,), Tuple{Vector{Float64}}}}, ::Type{ADNLPModels.ADModelBackend}, ::Int64, ::var"#loss_1arg#135"{typeof(loss_garch), Vector{Float64}, Optimisers.Restructure{Flux.Recur{GARCHCell{Float64, Float64}, Float64}, NamedTuple{(:cell, :state), Tuple{NamedTuple{(:λ, :γ, :vlt, :state0), Tuple{Int64, Int64, Int64, Tuple{}}}, Tuple{}}}}}, ::Int64, ::ADNLPModels.var"#c!#235"{var"#133#137"{Optimisers.Restructure{Flux.Recur{GARCHCell{Float64, Float64}, Float64}, NamedTuple{(:cell, :state), Tuple{NamedTuple{(:λ, :γ, :vlt, :state0), Tuple{Int64, Int64, Int64, Tuple{}}}, Tuple{}}}}}})
Closest candidates are:
var"#ADModelBackend#1"(::Type{GB}, ::Type{HvB}, ::Type{JvB}, ::Type{JtvB}, ::Type{JB}, ::Type{HB}, ::Type{GHJ}, ::Base.Pairs{Symbol, V, Tuple{Vararg{Symbol, N}}, NamedTuple{names, T}} where {V, N, names, T<:Tuple{Vararg{Any, N}}}, ::Type{ADNLPModels.ADModelBackend}, ::Integer, ::Any, ::Integer, ::Function) where {GB, HvB, JvB, JtvB, JB, HB, GHJ}
@ ADNLPModels ~/.julia/packages/ADNLPModels/5wWiv/src/ad.jl:67
Stacktrace:
[1] ADNLPModel!(f::Function, x0::Vector{Float64}, lvar::Vector{Float64}, uvar::Vector{Float64}, c!::ADNLPModels.var"#c!#235"{var"#133#137"{Optimisers.Restructure{Flux.Recur{GARCHCell{Float64, Float64}, Float64}, NamedTuple{(:cell, :state), Tuple{NamedTuple{(:λ, :γ, :vlt, :state0), Tuple{Int64, Int64, Int64, Tuple{}}}, Tuple{}}}}}}, lcon::Vector{Float64}, ucon::Vector{Float64}; y0::Vector{Float64}, name::String, minimize::Bool, kwargs::Base.Pairs{Symbol, Any, NTuple{7, Symbol}, NamedTuple{(:gradient_backend, :jacobian_backend, :jprod_backend, :jtprod_backend, :hessian_backend, :hprod_backend, :ghjvprod_backend), Tuple{DataType, DataType, DataType, DataType, DataType, ADNLPModels.EmptyADbackend, ADNLPModels.EmptyADbackend}}})
@ ADNLPModels ~/.julia/packages/ADNLPModels/5wWiv/src/nlp.jl:400
[2] ADNLPModels.ADNLPModel(f::Function, x0::Vector{Float64}, lvar::Vector{Float64}, uvar::Vector{Float64}, c::var"#133#137"{Optimisers.Restructure{Flux.Recur{GARCHCell{Float64, Float64}, Float64}, NamedTuple{(:cell, :state), Tuple{NamedTuple{(:λ, :γ, :vlt, :state0), Tuple{Int64, Int64, Int64, Tuple{}}}, Tuple{}}}}}, lcon::Vector{Float64}, ucon::Vector{Float64}; kwargs::Base.Pairs{Symbol, Any, NTuple{7, Symbol}, NamedTuple{(:gradient_backend, :jacobian_backend, :jprod_backend, :jtprod_backend, :hessian_backend, :hprod_backend, :ghjvprod_backend), Tuple{DataType, DataType, DataType, DataType, DataType, ADNLPModels.EmptyADbackend, ADNLPModels.EmptyADbackend}}})
@ ADNLPModels ~/.julia/packages/ADNLPModels/5wWiv/src/nlp.jl:378
[3] fit(model::Flux.Recur{GARCHCell{Float64, Float64}, Float64}, loss::typeof(loss_garch), returns::Vector{Float64})
@ Main ./In[45]:11
[4] top-level scope
@ In[46]:1
Zygote has hessian_reverse
too, which is "implemented using reverse over reverse mode, all Zygote":
So why not use this instead?
Also, unless I'm missing something, there's no BFGSADHessian
or something similar that would compute an approximation of the Hessian:
julia> subtypes(ADNLPModels.ADBackend)
22-element Vector{Any}:
ADNLPModels.EmptyADbackend
ADNLPModels.ForwardDiffADGHjvprod
ADNLPModels.ForwardDiffADGradient
ADNLPModels.ForwardDiffADHessian
ADNLPModels.ForwardDiffADHvprod
ADNLPModels.ForwardDiffADJacobian
ADNLPModels.ForwardDiffADJtprod
ADNLPModels.GenericForwardDiffADGradient
ADNLPModels.GenericForwardDiffADHvprod
ADNLPModels.GenericForwardDiffADJprod
ADNLPModels.GenericReverseDiffADJprod
ADNLPModels.ImmutableADbackend
ADNLPModels.InPlaceADbackend
ADNLPModels.ReverseDiffADGradient
ADNLPModels.ReverseDiffADHessian
ADNLPModels.ReverseDiffADHvprod
ADNLPModels.ReverseDiffADJacobian
ADNLPModels.ReverseDiffADJtprod
ADNLPModels.SparseADHessian
ADNLPModels.SparseADJacobian
ADNLPModels.SparseForwardADJacobian
ADNLPModels.ZygoteADGradient
julia>
That'd probably be useful, especially since Zygote is apparently struggling with Hessians (they're either computed with ForwardDiff, which I can't do, or the computation is "usually much slower, and more likely to find errors").
I think this should work instead:
nlp = ADNLPModels.ADNLPModel(
loss_1arg, par0, zero(par0), fill(Inf, length(par0)),
par->begin
m = reconstruct(par)
@. m.cell.λ - 1 / (1 + m.cell.γ)
end, [-Inf], [0.0],
gradient_backend=ADNLPModels.ZygoteADGradient,
jacobian_backend=ADNLPModels.ZygoteADJacobian,
jprod_backend=ADNLPModels.ZygoteADJprod,
jtprod_backend=ADNLPModels.ZygoteADJtprod,
hessian_backend=ADNLPModels.ZygoteADHessian,
hprod_backend = ADNLPModels.EmptyADbackend,
ghjvprod_backend = ADNLPModels.EmptyADbackend
)
The advantage of splitting the backend is that you can test the other backend easily. The following should work (not tested):
struct ZygoteADHessianReverse <: ADNLPModels.ImmutableADbackend
nnzh::Int
end
function ZygoteADHessianReverse(
nvar::Integer,
f,
ncon::Integer = 0,
c::Function = (args...) -> [];
kwargs...,
)
@assert nvar > 0
nnzh = nvar * (nvar + 1) / 2
return ZygoteADHessianReverse(nnzh)
end
function ADNLPModels.hessian(b::ZygoteADHessianReverse, f, x)
return Zygote.hessian_reverse(f, x)
end
and then hessian_backend=ZygoteADHessianReverse
.
There should be an option to ask Ipopt to use quasi-Newton approximation of the Hessian matrix.
Oh right, I used IPOPT's Hessian approximation NLPModelsIpopt.ipopt(nlp; print_level=0, output_file, hessian_approximation="limited-memory")
, and it worked. As usual, Hessians are hard...
Thank you very much for your help!
@ForceBru Can we close this issue, then?
By the way, I tried using Zygote.hessian_reverse but if fails even for basic problems, for instance
Zygote.hessian_reverse(x -> x[1] * x[2]^2 + x[1]^2 * x[2], [-1.2; 1.0])
@ForceBru Can we close this issue, then?
Since you opened another issue for the docs, I guess this one can be closed now, sure.
It really bothers me that a package which calls itself "21st century AD" can't compute even basic Hessians reliably...
I'm trying to switch the AD backend to Zygote.jl, but I keep getting "
ZygoteAD
not defined", even though I followed the ADNLPModels.jl docs.The README says:
https://github.com/JuliaSmoothOptimizers/ADNLPModels.jl/blob/9f19ff71d55749c3ba24f7e14691d10326bdcf8a/README.md?plain=1#L5
The documentation provides this example:
So I write code as in the docs, but get the error:
Versions:
How to switch the AD backend to Zygote?