Open gdalle opened 1 month ago
@gdalle
I propose to start by replacing SparseDiffTools.jl
and Symbolics.jl
by SparseConnectivityTracer.jl
.
We just need to update sparse_sym.jl
and sparse_diff_tools.jl
.
Sounds good! We're currently in the middle of a big refactor for SparseConnectivityTracer.jl, so it will have to wait a few days
The refactor has been merged and SparseConnectivityTracer v0.4.0
has just been tagged, so it's ready to be broken by you folks! ๐
PRs adding @test_broken
tests are more than welcome.
@adrhill @gdalle
I started to interface SparseConnectivityTracer.jl
quickly tonight and I have a few feedbacks:
SparseConnectivityTracer.jacobian_pattern(c!, cx, x0)
more efficient than SparseConnectivityTracer.jacobian_pattern(c, x0)
?
Note that SparseConnectivityTracer.jacobian_pattern(c!, y, x)
is not documented.F
(default), L
or U
.โ(x) = f(x) + dot(c(x), y)
. I have an error because SparseConnectivityTracer
doesn't like dot
. I can hack it with the help of a sum
.f
and c
instead of directly โ
as input?c(x)
and the Hessian of augmented Lagrangian l(x) = f(x) + dot(c(x), y)
to have a speed-up?MWE:
using SparseConnectivityTracer
y = zeros(3)
x = rand(3)
function J(y, x)
y[1] = x[1]^2
y[2] = 2 * x[1] * x[2]^2
y[3] = sin(x[3])
return y
end
S = jacobian_pattern(J, y, x)
x = rand(3)
y = rand(2)
f(x) = 2 * x[1] * x[2]^2
c(x) = [x[3]^3; x[3]^2]
S = hessian_pattern(f, x)
โ(x) = f(x) + dot(c(x), y)
S = hessian_pattern(โ, x)
ERROR: MethodError: no method matching conj(::SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{Int64, Int64}}})
Closest candidates are:
conj(::Missing)
@ Base missing.jl:101
conj(::Complex)
@ Base complex.jl:282
conj(::SymTridiagonal)
@ LinearAlgebra ~/julia/julia-1.10.3/share/julia/stdlib/v1.10/LinearAlgebra/src/tridiag.jl:160
...
Stacktrace:
[1] dot(x::SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{Int64, Int64}}}, y::Float64)
@ LinearAlgebra ~/julia/julia-1.10.3/share/julia/stdlib/v1.10/LinearAlgebra/src/generic.jl:876
[2] dot(x::Vector{SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{Int64, Int64}}}}, y::Vector{Float64})
@ LinearAlgebra ~/julia/julia-1.10.3/share/julia/stdlib/v1.10/LinearAlgebra/src/generic.jl:886
[3] โ(x::Vector{SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{Int64, Int64}}}})
@ Main ./REPL[41]:1
[4] trace_function(::Type{SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{โฆ}}}}, f::typeof(โ), x::Vector{Float64})
@ SparseConnectivityTracer ~/.julia/packages/SparseConnectivityTracer/JvHcU/src/pattern.jl:32
[5] hessian_pattern(f::Function, x::Vector{Float64}, ::Type{BitSet}, ::Type{Set{Tuple{Int64, Int64}}})
@ SparseConnectivityTracer ~/.julia/packages/SparseConnectivityTracer/JvHcU/src/pattern.jl:326
[6] top-level scope
@ REPL[42]:1
Some type information was truncated. Use `show(err)` to see complete types.
If I also preallocate the storage of c
for the Hessian of the augmented Lagrangian, SCT.jl returns an error because the storage of c
is not correct.
function compute_hessian_sparsity(f, nvar, c!, ncon)
if ncon == 0
x0 = rand(nvar)
S = SparseConnectivityTracer.hessian_pattern(f, x0)
return S
else
x0 = rand(nvar)
y0 = rand(ncon)
cx = similar(y0)
โ(x) = f(x) + sum(c!(cx, x)[i] * y0[i] for i = 1:ncon)
S = SparseConnectivityTracer.hessian_pattern(โ, x0)
return S
end
end
Basic Jacobian derivative with backend=ADNLPModels.SparseADJacobian and T=Float64: Error During Test at /home/alexis/Bureau/git/ADNLPModels.jl/test/sparse_jacobian.jl:6
Got exception outside of a @test
TypeError: in typeassert, expected Float64, got a value of type SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{Int64, Int64}}}
Stacktrace:
[1] setindex!(A::Vector{Float64}, x::SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{Int64, Int64}}}, i1::Int64)
@ Base ./array.jl:1021
[2] (::var"#c!#21")(cx::Vector{Float64}, x::Vector{SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{Int64, Int64}}}})
@ Main ~/Bureau/git/ADNLPModels.jl/test/sparse_jacobian.jl:10
[3] (::ADNLPModels.var"#35#37"{Vector{SparseConnectivityTracer.HessianTracer{BitSet, Set{Tuple{Int64, Int64}}}}, var"#c!#21", Vector{Float64}, Vector{Float64}})(i::Int64)
Maybe the best solution is to not provide c
in-place?
Hey @gdalle @adrhill ! I love the idea and agree with @amontoison the sparse Jacobian/Hessian is a good starting point.
For the DifferentiationInterface.jl, the best would be if you could provide an example that does this:
T = Float64
n = 2
x0 = rand(T, n)
f(x) = sum(x) # or something
function c!(cx, x)
cx[1] = x[1]^2 + x[2]^2 + x[1] * x[2]
# a) compute gradient in-place
gx = similar(x0)
# ... โf(x)
# b) Hessian-vector product in-place
v = ones(2)
# ... โfยฒ(x)*v
# c) Jacobian-vector product
# โc!(cx, x) * v
# d) transpose Jacobian-vector product
# โc!(cx, x)แต * v
# e) Lagrangian Hessian-vector product
# ... (โfยฒ(x) + y_i โยฒcแตข!(cx, x))*v
Usually, we store everything you need to compute the derivatives in a backend that is initialized with the model. The assumption is that we will do the derivatives multiple times.
Here are some answers to your questions:
Is SparseConnectivityTracer.jacobian_pattern(c!, cx, x0) more efficient than SparseConnectivityTracer.jacobian_pattern(c, x0) ? Note that SparseConnectivityTracer.jacobian_pattern(c!, y, x) is not documented.
The rationale behind it is that f!(y, x)
is more efficient than y = f(x)
for array-to-array functions. Thus, some functions are implemented in-place, and we need to support them in Jacobian sparsity detection. However, the actual gain of efficiency within the sparsity detection is minimal, since lots of allocations occur anyway due to set manipulations.
Is it possible to add an option to only return one triangle of the sparsity pattern of the Hessian? You could add a keyword argument with possible values F (default), L or U. Can we exploit the symmetry of the hessian to determine it more efficiently?
At the moment we do not exploit the symmetry of the Hessian. Internally, the Hessian pattern is represented as a Set{Tuple{Int,Int}}
, and we make no efforts to only store one triangle. It's "only" a factor-2 efficiency gain, so it's clearly something we wish we did, but:
In optimization we want the Hessian of the augmented Lagrangian when we have constraints so it means that we want the sparsity structure of the Hessian of โ(x) = f(x) + dot(c(x), y). I have an error because SparseConnectivityTracer doesn't like dot. I can hack it with the help of a sum.
Essentially we need to add some array rules to SparseConnectivityTracer, and dot
will be one of the very first.
In optimization we want the Hessian of the augmented Lagrangian when we have constraints so it means that we want the sparsity structure of the Hessian of โ(x) = f(x) + dot(c(x), y). Can you do better if you we provide f and c instead of directly โ as input? Can we compute in the the sparsity structure of the Jacobian of c(x) and the Hessian of augmented Lagrangian l(x) = f(x) + dot(c(x), y) to have a speed-up?
For SparseConnectivityTracer you can always detect the sparsity pattern of f
and c
separately, then combine them. But of course it won't be done automatically by the package, so you need to assemble the parts manually.
For DifferentiationInterface, @timholy asked a similar question a week ago, and he had some suggestions:
Feel free to weigh in on those discussions.
If I also preallocate the storage of c for the Hessian of the augmented Lagrangian, SCT.jl returns an error because the storage of c is not correct. Maybe the best solution is to not provide c in-place?
Don't provide c
in-place, because SparseConnectivityTracer uses custom number types and operator overloading (like ForwardDiff).
@tmigot here's an example doing everything you asked. The mantra of DifferentiationInterface is "prepare once, reuse multiple times", so it fits well with an optimization point of view. Note that although the differentiation overwrites the storage you provide, we cannot guarantee that other allocations won't occur.
using DifferentiationInterface
import Enzyme, ForwardDiff, Zygote
forward_backend = AutoForwardDiff()
reverse_backend = AutoEnzyme(Enzyme.Reverse) # supports mutation
second_order_backend = SecondOrder(AutoForwardDiff(), AutoZygote())
T = Float64
n = 2 # variables
m = 1 # constraints
x0 = rand(T, n)
obj(x) = sum(x)
function cons!(c, x)
c[1] = sum(abs2, x) - one(eltype(x))
return nothing
end
# a) gradient
extras_a = prepare_gradient(obj, reverse_backend, x0)
g0 = similar(x0)
gradient!(obj, g0, reverse_backend, x0, extras_a)
# b) Hessian-vector product
extras_b = prepare_hvp(obj, second_order_backend, x0, v0)
p0 = similar(x0)
hvp!(obj, p0, second_order_backend, x0, v0, extras_b)
# c) Jacobian-vector product
c0 = ones(T, m)
extras_c = prepare_pushforward(cons!, c0, forward_backend, x0, v0)
p0 = similar(c0)
pushforward!(cons!, c0, p0, forward_backend, x0, v0, extras_c)
# d) transpose Jacobian-vector product
extras_d = prepare_pullback(cons!, c0, reverse_backend, x0, v0)
p0 = similar(x0)
pullback!(cons!, c0, p0, reverse_backend, x0, v0, extras_d)
# e) Lagrangian Hessian-vector product
# not sure what you mean by this one, can either be obtained by summing the HVPs of `f` and `ci` or by taking the HVP of the Lagrangian as a whole
I think, as above, the main point of discussion will be handling the Lagrangian aspect
@amontoison as of v0.5.6 (released this morning), DifferentiationInterface supports vector-mode pushforwards, pullbacks and HVPs, which makes Jacobians and Hessians much faster. It only works with ForwardDiff for now, but I want to add Enzyme soon. That was the last ingredient necessary for DI to be feature-complete, at least for one-argument functions. From now on, if something doesn't work well, it will likely be due to a missing or suboptimal implementation, but not to a hole in the design. I think we should start discussing its integration in ADNLPModels. The hardest remaining issue will be handling the Lagrangian, which takes 2 inputs ($x$ and $\lambda$), but I would very much benefit from hashing out possible solutions with you.
Hey there friends of JSO!
@adrhill and I have been hard at work on two autodiff toolkits which you might find useful:
We would be happy to discuss potential uses, and integration with your ecosystem. We're having similar talks with Optimization.jl and the SciML organization at large.
Ping @amontoison @abelsiqueira since we've already chatted a few times :)