Preparation for second order

gdalle commented 4 months ago

Constructing the right extras becomes very tricky when different inner/outer backends must be called in various ways on closure functions

gdalle commented 3 months ago

So here's where I'm at. The typical structure of a second-order operator is:

function second_order_operator(f, backend::SecondOrder, x)
    function inner_operator_closure(z)
        inner_extras = prepare_inner_operator(f, inner(backend), z)
        return inner_operator(f, inner(backend), z, inner_extras)
    end
    outer_extras = prepare_outer_operator(inner_operator_closure, outer(backend), x)
    return outer_operator(inner_operator_closure, outer(backend), x, outer_extras)
end

It's hard to prepare the extras because

the inner operator is called on a variable z generated during the outer operator, so it may not have the same type as x. Typically, it might be a vector of Duals instead of a vector of numbers.
the outer operator extras depend on what is called, in this case a prepared inner operator closure (if it is not prepared it might take a different path, which means the outer operator tape would be wrong)
in one case (reverse-over-forward HVP), the inner operator closure closes over the vector v in addition to the rest, so the preparation signature may need to look different

My current suggested workflow for preparation (disregarding the v thing for now):

Define a function wrapper InputCopier which deepcopies and stores the first thing on which it is called, so that we can see what z is like inside the outer operator
Define the inner operator closure
Wrap it in an InputCopier
Call the outer operator on this, now we have the type of z
Prepare the inner operator closure with z
Prepare the outer operator on the prepared inner operator closure

gdalle commented 3 months ago

Actually this will not work because

some backends don't even call the underlying function as-is
some only call it during their own preparation step

gdalle commented 3 months ago

Partially solved by #135 where the outer differentiation is prepared, but not the inner one. I think it is close to optimal

adrhill commented 3 months ago

I see how it is difficult for us to provide default fallbacks for the inner preparation.

How about allowing people to manually deal with the inner preparation by adding an inner_XYZ_extras field to the HVPExtras and defaulting to NoXYZExtras?

gdalle commented 3 months ago

Possibly but that would be a very advanced use, and my take is that plenty of things will fail when people first try out the HVP, so optimizing performance that way is not high-priority for me.

Besides, for reverse mode backends which do not require preparation and work out of place (Zygote, Tracker), this is already optimal

gdalle commented 1 month ago

In the end I think the easiest approach is to have a mutable extras prepared on the first run, like so: https://discourse.julialang.org/t/second-order-autodiff-which-combinations-should-work/114892/12

adrhill commented 1 month ago

This sounds reasonable to me. To play the devil's advocate: on which backends are mutable extras doable and more performant than allocating new extras?

gdalle commented 1 month ago

I really can't think of any scenario where modifying a field of a mutable struct is more costly than essentially re-creating that field from scratch

adrhill commented 1 month ago

Sure, but for which backends is it possible?

(And while it might not be more costly, it should be strictly less costly to warrant the increase in code complexity.)

gdalle commented 1 month ago

It is doable on all backends. It's not the extras itself you mutate, it's just a field from a wrapper. Here's an example:

mutable struct InnerGradientWrapper{F,B}
    const f::F
    const backend::B
    extras::Union{Nothing,GradientExtras}  # type-unstable
end

function (igw::InnerGradientWrapper)(x::AbstractVector)
    if isnothing(igw.extras)
        igw.extras = prepare_gradient(igw.f, igw.backend, x)
    end
    return gradient(igw.f, igw.backend, x, igw.extras)
end

gdalle commented 1 month ago

I'm just wondering how much the type instability will hurt us here

gdalle commented 1 month ago

Tried it in #291 but the problem is that changing this extras object modifies the inner state of our gradient closure. As a result, outer preparation becomes invalid

adrhill commented 1 month ago

I'm just wondering how much the type instability will hurt us here

How about the following?

mutable struct InnerGradientWrapper{F,B,E<:Union{Nothing,GradientExtras}}
    const f::F
    const backend::B
    extras::E
end

adrhill commented 1 month ago

Tried it in #291 but the problem is that changing this extras object modifies the inner state of our gradient closure. As a result, outer preparation becomes invalid

Could you give an example? This is not clear to me from reading the diff in #291 and the PR contains no further comments.

gdalle commented 1 month ago

It's the same discussion that we have had for SparseConnectivityTracer and in #252. The InnerGradientWrapper is a closure that changes its state between calls, so reusing preparation is invalid for the outer backend which differentiates through it.

gdalle / DifferentiationInterface.jl

Preparation for second order #86