Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
I want to implement the improved WGAN with mxnet. However, the gradient penalty is a great headache. It is a complex loss function, which contains the data gradient. For simple loss functions, we can calculate the gradient of loss with respect to the output easily, but how to calculate the data gradient with respect to the output?
It seems that tensorflow has a gradient operator, just as the common convolution operators or fully connected operators. With such an operator, things become simple - we just need to write the expression for the loss function. However, I don't have an idea of it.
Maybe mxnet can also have such an operator in the future?
I want to implement the improved WGAN with mxnet. However, the gradient penalty is a great headache. It is a complex loss function, which contains the data gradient. For simple loss functions, we can calculate the gradient of loss with respect to the output easily, but how to calculate the data gradient with respect to the output?
It seems that tensorflow has a gradient operator, just as the common convolution operators or fully connected operators. With such an operator, things become simple - we just need to write the expression for the loss function. However, I don't have an idea of it.
Maybe mxnet can also have such an operator in the future?