Open bkmi opened 6 years ago
I believe your function f
is not actually scalar-valued as written; it's vector-valued (with the same shape as the input, since np.exp
acts element-wise). The hessian
function actually does the standard thing here, which is that it adds two trailing axes (or 'indices') to the type of the output of the function. In this case, since the output of the function has one axis (i.e. is a one-index object), the Hessian has two more axes to make three axes (i.e. it's a three-index object).
To be a bit more explicit: if the function f
has an input shape inshape = (in1, in2, ...)
and output shape outshape = (out1, out2, ...)
then jacobian(f)(x)
has shape outshape + inshape = (out1, out2, ..., in1, in2, ...)
and hessian(f)(x)
has shape outshape + inshape + inshape = (out1, out2, ..., in1, in2, ..., in1, in2, ...)
, which is consistent with just applying jacobian
twice (indeed, that's how hessian
is implemented). In the case of your example, we have inshape = (4,)
and outshape = (4,)
, so jacobian(f)(x)
has shape (4, 4)
and hessian(f)(x)
has shape (4, 4, 4)
. In the case of a scalar-valued function on vectors, we might have inshape = (4,)
and outshape = ()
, so that hessian(f)(x)
has shape (4, 4)
, which is a matrix as expected.
Ah, I understand. Your analysis looks correct to me. Thank you for clarifying that!
Dear Autograd, Nice tool! Learning it right now.
I am calculating the hessian of a scalar valued function. Given a function, f, and a vector input x of length n, hessian(f)(x) returns an array of shape (n, n, n). For example:
This is not the definition of the hessian matrix as given by wikipedia. If you want to obtain the matrix defined there, which is shape (n, n), you must do the following operation:
May I suggest updating the return of hessian to correspond with the definition in wikipedia? In the mean time I am using this wrapper
Thank you!