apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

Writing a customer trainer in MxNet that tracks uncertainty in the weights #8210

Closed aodhan-domhnaill closed 7 years ago

aodhan-domhnaill commented 7 years ago

I want to implement Bayes by Backpropagation (BbB) in MxNet, but two notable features of this algorithm cause me some confusion.

For one, when I look at example optimizers I note that they are given the gradient already, so I can specify the weights I want used for that given gradient computation. In BbB, I need to add a noise term to the weights when I compute the gradient. So I would need to control the weights used to compute the gradient.

Second, I need to track both the mean weights and the standard deviation, so I would need to "attach" more than just the gradient.

What are the best approaches for me to solve these problems?

aodhan-domhnaill commented 7 years ago

From doing a bit more research, it looks like implementing a new Gluon layer would be best. I was thinking of going off the dense connected layer, but changing the lines to something like,

self.weight = (self.params.get('weight', shape=(units, in_units),
                               init=weight_initializer,
                               allow_deferred_init=True) + 
               self.params.get('weight_std', shape=(units, in_units),
                               init=weight_initializer,
                               allow_deferred_init=True) * NOISE_TERM)

And same for the bias. But, I don't know how to do the NOISE_TERM such that it updates to a new sample from the unit Gaussian once per batch.

How do you do that?