Open gdalle opened 4 months ago
So here's where I'm at. The typical structure of a second-order operator is:
function second_order_operator(f, backend::SecondOrder, x)
function inner_operator_closure(z)
inner_extras = prepare_inner_operator(f, inner(backend), z)
return inner_operator(f, inner(backend), z, inner_extras)
end
outer_extras = prepare_outer_operator(inner_operator_closure, outer(backend), x)
return outer_operator(inner_operator_closure, outer(backend), x, outer_extras)
end
It's hard to prepare the extras because
z
generated during the outer operator, so it may not have the same type as x
. Typically, it might be a vector of Dual
s instead of a vector of numbers.v
in addition to the rest, so the preparation signature may need to look differentMy current suggested workflow for preparation (disregarding the v
thing for now):
InputCopier
which deepcopies and stores the first thing on which it is called, so that we can see what z
is like inside the outer operatorInputCopier
z
z
Actually this will not work because
Partially solved by #135 where the outer differentiation is prepared, but not the inner one. I think it is close to optimal
I see how it is difficult for us to provide default fallbacks for the inner preparation.
How about allowing people to manually deal with the inner preparation by adding an inner_XYZ_extras
field to the HVPExtras
and defaulting to NoXYZExtras
?
Possibly but that would be a very advanced use, and my take is that plenty of things will fail when people first try out the HVP, so optimizing performance that way is not high-priority for me.
Besides, for reverse mode backends which do not require preparation and work out of place (Zygote, Tracker), this is already optimal
In the end I think the easiest approach is to have a mutable extras prepared on the first run, like so: https://discourse.julialang.org/t/second-order-autodiff-which-combinations-should-work/114892/12
This sounds reasonable to me. To play the devil's advocate: on which backends are mutable extras doable and more performant than allocating new extras?
I really can't think of any scenario where modifying a field of a mutable struct is more costly than essentially re-creating that field from scratch
Sure, but for which backends is it possible?
(And while it might not be more costly, it should be strictly less costly to warrant the increase in code complexity.)
It is doable on all backends. It's not the extras itself you mutate, it's just a field from a wrapper. Here's an example:
mutable struct InnerGradientWrapper{F,B}
const f::F
const backend::B
extras::Union{Nothing,GradientExtras} # type-unstable
end
function (igw::InnerGradientWrapper)(x::AbstractVector)
if isnothing(igw.extras)
igw.extras = prepare_gradient(igw.f, igw.backend, x)
end
return gradient(igw.f, igw.backend, x, igw.extras)
end
I'm just wondering how much the type instability will hurt us here
Tried it in #291 but the problem is that changing this extras
object modifies the inner state of our gradient closure. As a result, outer preparation becomes invalid
I'm just wondering how much the type instability will hurt us here
How about the following?
mutable struct InnerGradientWrapper{F,B,E<:Union{Nothing,GradientExtras}}
const f::F
const backend::B
extras::E
end
Tried it in #291 but the problem is that changing this
extras
object modifies the inner state of our gradient closure. As a result, outer preparation becomes invalid
Could you give an example? This is not clear to me from reading the diff in #291 and the PR contains no further comments.
It's the same discussion that we have had for SparseConnectivityTracer and in #252. The InnerGradientWrapper
is a closure that changes its state between calls, so reusing preparation is invalid for the outer backend which differentiates through it.
Constructing the right extras becomes very tricky when different inner/outer backends must be called in various ways on closure functions