HIPS / autograd

Efficiently computes derivatives of NumPy code.
MIT License
6.95k stars 907 forks source link

Automatically reformat hessian(func) into matrix #363

Open bkmi opened 6 years ago

bkmi commented 6 years ago

Dear Autograd, Nice tool! Learning it right now.

I am calculating the hessian of a scalar valued function. Given a function, f, and a vector input x of length n, hessian(f)(x) returns an array of shape (n, n, n). For example:

    import autograd.numpy as np
    from autograd import hessian

    def f(x):
        y = np.exp(x)
        return y

    xx = np.ones(4)
    ddf = hessian(f)
    print(ddf(xx).shape)
    # (4,4,4)

This is not the definition of the hessian matrix as given by wikipedia. If you want to obtain the matrix defined there, which is shape (n, n), you must do the following operation:

    print(ddf(xx).diagonal(axis1=1, axis2=2))

May I suggest updating the return of hessian to correspond with the definition in wikipedia? In the mean time I am using this wrapper

    def mhessian(f):
        ddf = hessian(f)
        def hess(x):
            return ddf(x).diagonal(axis1=1, axis2=2)
        return hess

    ddf = mhessian(f)
    print(ddf(xx))

Thank you!

mattjj commented 6 years ago

I believe your function f is not actually scalar-valued as written; it's vector-valued (with the same shape as the input, since np.exp acts element-wise). The hessian function actually does the standard thing here, which is that it adds two trailing axes (or 'indices') to the type of the output of the function. In this case, since the output of the function has one axis (i.e. is a one-index object), the Hessian has two more axes to make three axes (i.e. it's a three-index object).

mattjj commented 6 years ago

To be a bit more explicit: if the function f has an input shape inshape = (in1, in2, ...) and output shape outshape = (out1, out2, ...) then jacobian(f)(x) has shape outshape + inshape = (out1, out2, ..., in1, in2, ...) and hessian(f)(x) has shape outshape + inshape + inshape = (out1, out2, ..., in1, in2, ..., in1, in2, ...), which is consistent with just applying jacobian twice (indeed, that's how hessian is implemented). In the case of your example, we have inshape = (4,) and outshape = (4,), so jacobian(f)(x) has shape (4, 4) and hessian(f)(x) has shape (4, 4, 4). In the case of a scalar-valued function on vectors, we might have inshape = (4,) and outshape = (), so that hessian(f)(x) has shape (4, 4), which is a matrix as expected.

bkmi commented 6 years ago

Ah, I understand. Your analysis looks correct to me. Thank you for clarifying that!