gdalle / DifferentiationInterface.jl

An interface to various automatic differentiation backends in Julia.
https://gdalle.github.io/DifferentiationInterface.jl/DifferentiationInterface
MIT License
151 stars 13 forks source link

Preparation for second order #86

Open gdalle opened 4 months ago

gdalle commented 4 months ago

Constructing the right extras becomes very tricky when different inner/outer backends must be called in various ways on closure functions

gdalle commented 3 months ago

So here's where I'm at. The typical structure of a second-order operator is:

function second_order_operator(f, backend::SecondOrder, x)
    function inner_operator_closure(z)
        inner_extras = prepare_inner_operator(f, inner(backend), z)
        return inner_operator(f, inner(backend), z, inner_extras)
    end
    outer_extras = prepare_outer_operator(inner_operator_closure, outer(backend), x)
    return outer_operator(inner_operator_closure, outer(backend), x, outer_extras)
end

It's hard to prepare the extras because

My current suggested workflow for preparation (disregarding the v thing for now):

  1. Define a function wrapper InputCopier which deepcopies and stores the first thing on which it is called, so that we can see what z is like inside the outer operator
  2. Define the inner operator closure
  3. Wrap it in an InputCopier
  4. Call the outer operator on this, now we have the type of z
  5. Prepare the inner operator closure with z
  6. Prepare the outer operator on the prepared inner operator closure
gdalle commented 3 months ago

Actually this will not work because

gdalle commented 3 months ago

Partially solved by #135 where the outer differentiation is prepared, but not the inner one. I think it is close to optimal

adrhill commented 3 months ago

I see how it is difficult for us to provide default fallbacks for the inner preparation.

How about allowing people to manually deal with the inner preparation by adding an inner_XYZ_extras field to the HVPExtras and defaulting to NoXYZExtras?

gdalle commented 3 months ago

Possibly but that would be a very advanced use, and my take is that plenty of things will fail when people first try out the HVP, so optimizing performance that way is not high-priority for me.

Besides, for reverse mode backends which do not require preparation and work out of place (Zygote, Tracker), this is already optimal

gdalle commented 1 month ago

In the end I think the easiest approach is to have a mutable extras prepared on the first run, like so: https://discourse.julialang.org/t/second-order-autodiff-which-combinations-should-work/114892/12

adrhill commented 1 month ago

This sounds reasonable to me. To play the devil's advocate: on which backends are mutable extras doable and more performant than allocating new extras?

gdalle commented 1 month ago

I really can't think of any scenario where modifying a field of a mutable struct is more costly than essentially re-creating that field from scratch

adrhill commented 1 month ago

Sure, but for which backends is it possible?

(And while it might not be more costly, it should be strictly less costly to warrant the increase in code complexity.)

gdalle commented 1 month ago

It is doable on all backends. It's not the extras itself you mutate, it's just a field from a wrapper. Here's an example:

mutable struct InnerGradientWrapper{F,B}
    const f::F
    const backend::B
    extras::Union{Nothing,GradientExtras}  # type-unstable
end

function (igw::InnerGradientWrapper)(x::AbstractVector)
    if isnothing(igw.extras)
        igw.extras = prepare_gradient(igw.f, igw.backend, x)
    end
    return gradient(igw.f, igw.backend, x, igw.extras)
end
gdalle commented 1 month ago

I'm just wondering how much the type instability will hurt us here

gdalle commented 1 month ago

Tried it in #291 but the problem is that changing this extras object modifies the inner state of our gradient closure. As a result, outer preparation becomes invalid

adrhill commented 1 month ago

I'm just wondering how much the type instability will hurt us here

How about the following?

mutable struct InnerGradientWrapper{F,B,E<:Union{Nothing,GradientExtras}}
    const f::F
    const backend::B
    extras::E
end
adrhill commented 1 month ago

Tried it in #291 but the problem is that changing this extras object modifies the inner state of our gradient closure. As a result, outer preparation becomes invalid

Could you give an example? This is not clear to me from reading the diff in #291 and the PR contains no further comments.

gdalle commented 1 month ago

It's the same discussion that we have had for SparseConnectivityTracer and in #252. The InnerGradientWrapper is a closure that changes its state between calls, so reusing preparation is invalid for the outer backend which differentiates through it.