apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

Reparameterization trick for Gamma distribution #18140

Open leandrolcampos opened 4 years ago

leandrolcampos commented 4 years ago

Description

I'd like to suggest the implementation of implicit reparameterization gradients, as described in the paper [1], for the Gamma distribution: ndarray.sample_gamma and symbol.sample_gamma.

This will allow this distribution and others that depend on it, like Beta, Dirichlet and Student t distributions, to be used as easily as the Normal distribution in stochastic computation graphs.

Stochastic computation graphs are necessary for variational autoenecoders (VAEs), automatic variational inference, Bayesian learning in neural networks, and principled regularization in deep networks.

The proposed approach in the paper [1] is the same used in the TensorFlow's method tf.random.gamma, as we can see in [2].

Thanks for the opportunity to request this feature.

References

sxjscience commented 4 years ago

@xidulu @szhengac

xidulu commented 4 years ago

Hi @leandrolcampos

Currently, pathwise gradient is only implemented for mx.np.random.{normal, gumbel, logistic, weibull, exponential, pareto} in the backend.

We are planning to implement (in C++ backend) implicit reparam grad for Gamma related distribution in the future, which is extremely useful, as you pointed out, in scenarios like BBVI for LDA.

Another possible solution, is to wrap the sampling Op as a CustomOp, which allows you to manually define the backward computation with Python. https://mxnet.apache.org/api/python/docs/tutorials/extend/customop.html

leandrolcampos commented 4 years ago

Hi @xidulu,

Thanks for your suggestion. I'll follow it. But, for performance reasons, I also look forward to your implementation (in C++ backend) of implicit reparam grad for Gamma related distribution.