General support of OPs for second-order gradient

apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

https://mxnet.apache.org

Apache License 2.0

20.77k stars 6.8k forks source link

General support of OPs for second-order gradient #10002

Open lightingghost opened 6 years ago

lightingghost commented 6 years ago

As I saw mxnet has autograd package to support high order gradients, I tried to implement wgan-gp with mxnet. But I got an error of

mxnet.base.MXNetError: [08:32:17] C:\projects\mxnet-distro-win\mxnet-build\nnvm\src\pass\gradient.cc:187: Operator _backward_Convolution is non-differentiable because it didn't register FGradient attribute.

It seems convolution operator still does not support higher order gradients?

duhd1993 commented 5 years ago

Any news? I saw no updates in the Jira issue

apeforest commented 5 years ago

@lonelykid We are actively working on the higher order gradient feature right now. We will update you once the PR is ready for review. Thanks for your patience.

apeforest commented 5 years ago

@lonelykid PR is out for review https://github.com/apache/incubator-mxnet/pull/14613 Your comments are appreciated.

gilbertfrancois commented 3 years ago

What is the current status of support for second order derivatives in Gluon? I tried implementing the method from the paper Improved Training of Wasserstein GANs, but the training program returns an error when I add the gradient penalty to the loss function and do a back propagation. I noticed that, with mxnet version 1.7, it works for Dense layers without activation, but e.g. Conv2D and many other layers seem still unsupported. I saw a similar question here #5982, but that was around 3 years ago.

Are there plans to add second order derivative support for e.g. gluon.nn.Conv2D, gluon.nn.BatchNorm, gluon.nn.Activation, gluon.nn.LeakyReLU?

Wallart commented 3 years ago

Same here on MXNet 1.8.0.rc2. I'm trying to implement cBiGAN which is composed of residual blocks and follows WGAN-GP training procedure.