JuliaDiff / SparseDiffTools.jl

Fast jacobian computation through sparsity exploitation and matrix coloring
MIT License
240 stars 42 forks source link

Hessian Vector Product Operator #175

Closed RS-Coop closed 3 months ago

RS-Coop commented 2 years ago

Using the existing hvp operator and the forward over back AD hvp function in this repository I have made the following modified hvp operator.

mutable struct HvpOperator{F, T, I}
    f::F
    x::Vector{T}
    dual_cache1::Vector{Dual{Nothing, T, 1}}
    dual_cache2::Vector{Dual{Nothing, T, 1}}
    size::I
    nprod::UInt16
end

function HvpOperator(f, x::AbstractVector)
    cache1 = Dual.(x, similar(x))
    cache2 = Dual.(x, similar(x))
    return HvpOperator(f, x, cache1, cache2, size(x, 1), UInt16(0))
end

Base.eltype(op::HvpOperator{F, T, I}) where{F, T, I} = T
Base.size(op::HvpOperator) = (op.size, op.size)

function LinearAlgebra.mul!(result::AbstractVector, op::HvpOperator, v::AbstractVector)
    op.nprod += 1

    g = (∇, x) -> ∇ .= gradient(op.f, x)[1]
    op.dual_cache1 .= Dual.(op.x, v)
    result .= partials.(g(op.dual_cache2, op.dual_cache1), 1)
end

I have a couple of questions related to this:

  1. It seems that it would be possible to further optimize this by dropping the field for x, and instead just updating partials field of dual_cache1. I can't seem to figure out the process for doing this, is it possible?
  2. Are we saving anything by using dual_cache2 when the Zygote gradient operation cannot be performed in-place?
  3. Can further specification of types improve performance? For instance, I know that the function type F could be further specified as taking a vector as input and returning a scalar.
  4. Slightly off topic from the other questions, is the symmetry of the Hessian automatically being exploited by the AD tools, or does that even make sense?

Thanks for the help!

ChrisRackauckas commented 2 years ago

It seems that it would be possible to further optimize this by dropping the field for x, and instead just updating partials field of dual_cache1. I can't seem to figure out the process for doing this, is it possible?

It's possible. See some of the tricks in the Jacobian-vector product code.

Are we saving anything by using dual_cache2 when the Zygote gradient operation cannot be performed in-place?

Nope. That's more for when we get around to doing this with ReverseDiff.jl or Enzyme.jl

Can further specification of types improve performance? For instance, I know that the function type F could be further specified as taking a vector as input and returning a scalar.

Nope. But fixing the tag types would be good. It's not good to tag nothing.

Slightly off topic from the other questions, is the symmetry of the Hessian automatically being exploited by the AD tools, or does that even make sense?

It is. There's a step at the end of a Hessian calculation which uses it.

RS-Coop commented 2 years ago

Thanks for the responses! I went looking for the tricks you mentioned in the JVP code, but I can't seem to find anything useful, could you point me in a more specific direction?

gdalle commented 3 months ago

DifferentiationInterface.jl has an hvp! function which works on many different backend combinations (not just ForwardDiff over Zygote) and allows partial caching, if you're interested

ChrisRackauckas commented 3 months ago

The hvp! function is not an operator.

Though this operator exists in the package and the issue just wasn't closed https://github.com/JuliaDiff/SparseDiffTools.jl/blob/master/src/differentiation/jaches_products.jl#L292-L330

gdalle commented 3 months ago

The hvp! function is not an operator.

Which is why I called it a function. The associated operators will probably be found in AutoDiffOperators.jl, @oschulz seemed enthusiastic about making the switch to DI:

ChrisRackauckas commented 3 months ago

It also needs to satisfy the full SciMLOperator interface before it'll be useful in solvers, but yes that's the missing step.