b45ch1 / algopy

AlgoPy is a Research Prototype for Algorithmic Differentation in Python
80 stars 14 forks source link

wishlist: utpm hessian vector product #38

Closed argriffing closed 10 years ago

argriffing commented 10 years ago

I can get the jacobian vector product and the hessian matrix, but I don't see an efficient way to get the hessian vector product. My hope is that this could speed up trust-region conjugate gradient minimization that uses only hessian-vector products, as in https://github.com/scipy/scipy/blob/master/scipy/optimize/_trustregion_ncg.py. I am currently extracting the whole hessian and then multiplying it by the vector but I would guess that this is unnecessarily inefficient.

b45ch1 commented 10 years ago

Hello Alex,

A UTPM Hessian-vector product would require another polarization formula to cast the problem into univariate Taylor polynomials. I'm not 100% sure if it's going to work, but I'm optimistic. I'll have a look at it this week.

In the meanwhile: Why don't you use a combined forward and reverse sweep to evaluate the hessian-vector product?

cheers, Sebastian

argriffing commented 10 years ago

A UTPM Hessian-vector product would require another polarization formula

Equation 5.36 of chapter 5 of Optimization Algorithms on Matrix Manifolds (P.-A. Absil, R. Mahony & R. Sepulchre 2008) says it's a polarization identity for the Hessian as a linear operator, but I'm not sure if that's helpful. I don't know much about differential geometry.

Why don't you use a combined forward and reverse sweep to evaluate the hessian-vector product?

I could do this. I guess the computation graph has a helper function for reverse mode that computes it. Honestly I've been avoiding the computation graph approaches in algopy because they seem more fragile and I don't understand them as well, even though I understand that reverse mode is necessary to get the full benefit of automatic differentiation.

b45ch1 commented 10 years ago

In the differential geometry literature things tend to be explained complicatedly than necessary.

We want to compute H v, where H is the Hessian of a function f:R^N -> R

What we need is the following formula:

image

where H is the Hessian. I.e., we choose s_2 := v and s_1 = e_j, for `j=1,...,N, e_j`` the j-th Cartesian basis vector. According to the formula, we have to propagate the directions

s_1 + s_2
s_2
s_1

i.e., in total P = N + 1 + N = 2N+1 directions. We have to setup a UTPM instance x with x.data.shape = (2,P,N) and fill it as follows

I = np.eye(N)
for p in range(N):
    x.data[1,p,;] = I[p]
for p in range(N):
    x.data[1,p+N,:] = v+I[p]
x.data[1, 2*N, :] = v

then compute y = f(x) in Taylor arithmetic

and from the result assemble the Hessian-vector product

Hv = np.zeros(N)
for n in range(N):
    Hv[n] = const * ( y.data[1,p] + y.data[1, p+N] + y.data[1, 2*N])

We could put these algorithms into UTPM.init_hessian_vector and UTPM.extract_hessian_vector to automate the process.

If you want, you can give it a try. Otherwise, I'll implement and test that approach in the coming days.

cheers, Sebastian

argriffing commented 10 years ago

Thanks for looking at this! I might try implementing it, after https://github.com/b45ch1/algopy/pull/36 or some other scipy version checking fix is committed to the algopy master branch.

Right now the master branch of algopy doesn't work for me because I get an error when I import algopy. This is because I'm using a development version of scipy, so the line scipy_version = [int(i) for i in scipy.version.version.split('.')] raises an exception when it tries to int-ify a string that looks like dev-df566b3.

argriffing commented 10 years ago

I've added PR https://github.com/b45ch1/algopy/pull/39 with tests but I'm not sure it is correct.

In your notes you have

Hv[n] = const * ( y.data[1,p] + y.data[1, p+N] + y.data[1, 2*N])

but I am using

Hv[n] = - y.data[2,n] + y.data[2, n+N] - y.data[2, 2*N]

There are some differences: 1) not using the const (or I guess const = 1) 2) subtracting two of the terms instead of adding 3) second order instead of first order 4) n instead of p

I guess that (2), (3), (4) are just typos in your notes. I think that (1) is correct because the 1/2 factor is built into the UTPM notation?

In the bigger picture, I guess the idea is that we can directly construct the hessian-vector product using second order Taylor arithmetic along 2n+1 directions instead of along the n*(n+1)/2 directions that are required for extracting the Hessian itself?

b45ch1 commented 10 years ago

Sorry for the confusion, I was in a rush when I wrote the post. I hope you didn't lose too much time figuring it out. So yes, they are "typos": The code should correspond to the math formula I've posted.

And yes, the advantage is O(2n+1) vs O(n*(n+1)/2). Using the reverse mode one would get O(1).

argriffing commented 10 years ago

This is implemented in https://github.com/b45ch1/algopy/pull/39 which has been merged.