WIP: `value_jacobian_and_hessian`

timholy commented 6 months ago

This aims to support returning derivatives 0-2 for vector-valued functions.

Best reviewed with "Hide whitespace" since my editor deleted a bunch of dangling whitespace.

Tests pass for ForwardDiff but not most backends. I don't have a deep understanding of the internals, so any help would be greatly appreciated.

gdalle commented 6 months ago

Hi Tim, thanks for the contribution!

Given the recent emergence of DifferentiationInterface, this PR might be better suited there. In the long run, AbstractDifferentiation will import DifferentiationInterface and extend it for things like non array input types or multiple arguments (neither of which you probably care about here).

However, I'm unsure whether this needs to be part of the API to begin with, cause it's the first time I'm hearing about a "hessian" for vector-valued functions. Can you get the same object by doing "Jacobian of Jacobian" and a reshaping? What problem does this PR solve on your end?

timholy commented 6 months ago

Given the recent emergence of DifferentiationInterface, this PR might be better suited there

Ah, I hadn't yet stumbled across it (I'm not reading discourse/etc much these days). If this package is semi-deprecated, would it make sense to link to DifferentiationInterface from the README?

Can you get the same object by doing "Jacobian of Jacobian" and a reshaping

That's exactly how this (tries) to compute it. It works "out of the box" for ForwardDiff, but that doesn't seem true of all backends. And

Hs = [SomeDifferentiator.hessian(x -> c(x)[i], x) for i = 1:ncomponents]

is unlikely to be efficient for all vector-valued c, since you're throwing away all but one of the values of c(x) (hopefully the compiler could optimize away the inefficiency but of course that's not guaranteed).

What problem does this PR solve on your end?

I'd argue there's nothing "weird" about wanting to perform a second-order expansion of a vector-valued function. We use quadratic models all the time for scalar-valued functions, and there doesn't seem to be any reason why in principle one might not find them useful for performing analysis on vector-valued functions.

If you know in advance that you'll want orders 0-2, computing them simultaneously seems in principle like it could save some computation, which is my interest here. However I have only rudimentary knowledge of AD, so I'm happy to accept guidance. Let me know whether you think this should be part of the DifferentiationInterface API.

gdalle commented 6 months ago

If this package is semi-deprecated, would it make sense to link to DifferentiationInterface from the README?

It's not semi-deprecated, it's in a transition phase waiting for the formal announcement of DifferentiationInterface.jl. In the long run it will wrap DifferentiationInterface.jl's basic functionality, and add some more stuff that is not supported by every AD backend (like multiple arguments), whereas DifferentiationInterface.jl aims to be a common denominator. That's what came out of our discussion with @mohamed82008 and @adrhill to integrate both projects instead of making them compete.

It works "out of the box" for ForwardDiff, but that doesn't seem true of all backends.

In general, second-order differentiation doesn't work for every backend, and even less so for every combination of backends, so I'm not overly surprised.

I'd argue there's nothing "weird" about wanting to perform a second-order expansion of a vector-valued function. [...] Let me know whether you think this should be part of the DifferentiationInterface API.

I agree that it makes sense in some settings, however

It is much less common than the Hessian matrix of a scalar-valued function
I'm not aware of a universally agreed-upon terminology for this second order object

That is why I don't think it should be part of the official API for either AbstractDifferentiation or DifferentiationInterface. It also would bring a whole lot of testing requirements for what I assume is a minimal user base. Still, I have opened an issue (https://github.com/gdalle/DifferentiationInterface.jl/issues/206) to keep track of your request and for other people to weigh in.

That's exactly how this (tries) to compute it.

The other reason why I don't think it should be part of the API is that it is indeed easy to obtain the object you want as the (reshaped) Jacobian of a Jacobian. And doing it through DifferentiationInterface will probably be more robust and well-supported across backends, at least in the short to medium term. I can help you debug it if you want.

If you know in advance that you'll want orders 0-2, computing them simultaneously seems in principle like it could save some computation, which is my interest here.

That is the tricky part indeed, and the most convincing argument for a unified API.

ForwardDiff and ReverseDiff can interact with DiffResults to compute orders 0-2 simultaneously for scalar-valued functions. However, I don't know if any AD backend can currently do the same for vector-valued functions. The bulk of the computation will be spent in the second order object anyway, so I don't think splitting value_and_jacobian from vector_hessian (or whatever you call it) will lead to much performance degradation. But I'll think about it some more and report back

timholy commented 6 months ago

Sounds good. In any case this seems like it might be premature and might best be closed.

JuliaDiff / AbstractDifferentiation.jl

WIP: `value_jacobian_and_hessian` #134