Closed leezu closed 6 years ago
there is no need for R op. We just need to differentiate d(v^Tg)/dx where g = dL/dx
The problem here is most ops don't support second order derivative. For start you can try adding FGradient to Fully connected layer.
Would it cost too much memory if we choose to manually compute the second-oder derivative?
It is usually fine as long as the objective is an scalar . You don't really need hessian, but functions of gradient
Is it possible to implement Minpy's autograd feature in MXNet?
@nicklhy We are working on it, it is still experimental and not released yet https://github.com/dmlc/mxnet/blob/master/python/mxnet/contrib/autograd.py#L172
Is there any update? Is this being discussed in another 3d?
I would like to implement https://arxiv.org/abs/1706.04859 Sobolev Training for Neural Networks which uses a gradient penalty for knowledge distillation.
Would it work now with gluon?
Is this problem solved?
no
I think https://github.com/apache/incubator-mxnet/commit/37a651664854734b396a58598a0a501be5674b7e should have partially solved the problem.
mxnet currenlty only supports calculating the gradient of a loss constructed via chaining
mxnet.symbol
's. This is unfortunately not enough to implement some more advanced training methods, such as the newly poposed Improved Training of Wasserstein GAN.For that method the loss function itself will contain a reference to the gradient of the network with respect to the inputs.
Furthermore for some reinforcement learning methods it is important to compute a Hessian-Vector product. The key to the auto-computation of the Hessian-Vector product and the Hessian matrix is the R-Operator (Right product of Jacobian matrix and the input vector). We can refer to the paper "Fast Exact Multiplication by the Hessian" and the Theano's document of ROp for more details.
For that, we can add another
R_forward
andR_backward
functions in MXNet to compute theR(output)
and theR(gradient)
over some parameter w. The workflow will be like this. Like the traditional forward-backward computation of the gradient, we first computenet.forward()
andnet.backward()
to get the output/gradient and then usenet.R_forward(param, v)
andnet.R_backward(param, v)
to get the Hessian-vector product.To compute the Hessian matrix, we can call the subprogram of computing the Hessian-vector product multiple times. We can refer to Wiki and Theano's implementation of Hessian.
@sxjscience