Closed aodhan-domhnaill closed 7 years ago
From doing a bit more research, it looks like implementing a new Gluon layer would be best. I was thinking of going off the dense connected layer, but changing the lines to something like,
self.weight = (self.params.get('weight', shape=(units, in_units),
init=weight_initializer,
allow_deferred_init=True) +
self.params.get('weight_std', shape=(units, in_units),
init=weight_initializer,
allow_deferred_init=True) * NOISE_TERM)
And same for the bias. But, I don't know how to do the NOISE_TERM
such that it updates to a new sample from the unit Gaussian once per batch.
How do you do that?
I want to implement Bayes by Backpropagation (BbB) in MxNet, but two notable features of this algorithm cause me some confusion.
For one, when I look at example optimizers I note that they are given the gradient already, so I can specify the weights I want used for that given gradient computation. In BbB, I need to add a noise term to the weights when I compute the gradient. So I would need to control the weights used to compute the gradient.
Second, I need to track both the mean weights and the standard deviation, so I would need to "attach" more than just the gradient.
What are the best approaches for me to solve these problems?