Closed argriffing closed 10 years ago
Hello Alex,
A UTPM Hessian-vector product would require another polarization formula to cast the problem into univariate Taylor polynomials. I'm not 100% sure if it's going to work, but I'm optimistic. I'll have a look at it this week.
In the meanwhile: Why don't you use a combined forward and reverse sweep to evaluate the hessian-vector product?
cheers, Sebastian
A UTPM Hessian-vector product would require another polarization formula
Equation 5.36 of chapter 5 of Optimization Algorithms on Matrix Manifolds (P.-A. Absil, R. Mahony & R. Sepulchre 2008) says it's a polarization identity for the Hessian as a linear operator, but I'm not sure if that's helpful. I don't know much about differential geometry.
Why don't you use a combined forward and reverse sweep to evaluate the hessian-vector product?
I could do this. I guess the computation graph has a helper function for reverse mode that computes it. Honestly I've been avoiding the computation graph approaches in algopy because they seem more fragile and I don't understand them as well, even though I understand that reverse mode is necessary to get the full benefit of automatic differentiation.
In the differential geometry literature things tend to be explained complicatedly than necessary.
We want to compute H v
, where H is the Hessian of a function f:R^N -> R
What we need is the following formula:
where H is the Hessian. I.e., we choose s_2 := v
and s_1 = e_j
, for `j=1,...,N, e_j`` the j-th Cartesian basis vector.
According to the formula, we have to propagate the directions
s_1 + s_2
s_2
s_1
i.e., in total P = N + 1 + N = 2N+1
directions.
We have to setup a UTPM instance x
with x.data.shape = (2,P,N)
and fill it as follows
I = np.eye(N)
for p in range(N):
x.data[1,p,;] = I[p]
for p in range(N):
x.data[1,p+N,:] = v+I[p]
x.data[1, 2*N, :] = v
then compute y = f(x)
in Taylor arithmetic
and from the result assemble the Hessian-vector product
Hv = np.zeros(N)
for n in range(N):
Hv[n] = const * ( y.data[1,p] + y.data[1, p+N] + y.data[1, 2*N])
We could put these algorithms into UTPM.init_hessian_vector
and UTPM.extract_hessian_vector
to automate the process.
If you want, you can give it a try. Otherwise, I'll implement and test that approach in the coming days.
cheers, Sebastian
Thanks for looking at this! I might try implementing it, after https://github.com/b45ch1/algopy/pull/36 or some other scipy version checking fix is committed to the algopy master branch.
Right now the master branch of algopy doesn't work for me because I get an error when I import algopy. This is because I'm using a development version of scipy, so the line
scipy_version = [int(i) for i in scipy.version.version.split('.')]
raises an exception when it tries to int-ify a string that looks like dev-df566b3
.
I've added PR https://github.com/b45ch1/algopy/pull/39 with tests but I'm not sure it is correct.
In your notes you have
Hv[n] = const * ( y.data[1,p] + y.data[1, p+N] + y.data[1, 2*N])
but I am using
Hv[n] = - y.data[2,n] + y.data[2, n+N] - y.data[2, 2*N]
There are some differences: 1) not using the const (or I guess const = 1) 2) subtracting two of the terms instead of adding 3) second order instead of first order 4) n instead of p
I guess that (2), (3), (4) are just typos in your notes. I think that (1) is correct because the 1/2 factor is built into the UTPM notation?
In the bigger picture, I guess the idea is that we can directly construct the hessian-vector product using second order Taylor arithmetic along 2n+1 directions instead of along the n*(n+1)/2 directions that are required for extracting the Hessian itself?
Sorry for the confusion, I was in a rush when I wrote the post. I hope you didn't lose too much time figuring it out. So yes, they are "typos": The code should correspond to the math formula I've posted.
And yes, the advantage is O(2n+1)
vs O(n*(n+1)/2)
. Using the reverse mode one would get O(1)
.
This is implemented in https://github.com/b45ch1/algopy/pull/39 which has been merged.
I can get the jacobian vector product and the hessian matrix, but I don't see an efficient way to get the hessian vector product. My hope is that this could speed up trust-region conjugate gradient minimization that uses only hessian-vector products, as in https://github.com/scipy/scipy/blob/master/scipy/optimize/_trustregion_ncg.py. I am currently extracting the whole hessian and then multiplying it by the vector but I would guess that this is unnecessarily inefficient.