Posterior Predictive Distribution

CBird210 commented 4 years ago

Hi,

I’m trying to approximate the posterior predictive distribution that corresponds to the MC Dropout and Bayes By Backprop neural networks (which I see you say is possible in the section MNIST classification in README). I’m new to Python, so I’m having a little trouble figuring out how exactly you do this/ what part of the code carries this out?

I tried to go about it by playing around with your function get_weight_samples, but noticed that this gives me the same weights each time I train the same network. For example, when training the MC Dropout network, I assumed the output of get_weight_samples would change as in theory different nodes are dropped during training each time. My confusion here makes me think that I maybe have misinterpreted what this function is supposed to be doing.

Any clarification would be greatly appreciated! I’m sorry if this wasn’t the right place to post a question of this nature – new to Github and still learning the ropes.

stratisMarkou commented 4 years ago

Hi @CBird210 and thanks for bringing this up here. From looking at the code it seems that for MC dropout, the method get_weight_samples does not sample the weights but instead gets the raw weight values without turning any of them off. For bayes-by-backprop, the weights are in fact sampled. Any ideas on what's happening in MC dropout @JavierAntoran?

JavierAntoran commented 4 years ago

Hi @CBird210 @stratisMarkou,

As @stratisMarkou said, for MC dropout we return the raw weight values. The MC dropout posterior is composed of delta functions at every parameter value and delta functions at 0. Thus, sampling the weights would randomly return some weight values and some zeros.

The get_weight_samples function was written to get insight into approximate inference behavior by allowing us to plot a histogram of weight values (see top right plot in https://javierantoran.github.io/assets/poster_advml.pdf). For bayes-by-backprop we actually sample weights as this allows us to represent weight posterior variance in the above histogram. For MC dropout, sampling would not tell us much about the range of the learned weights as dropout probabilities are fixed, not learned. Perhaps get_weight_samples is a poor naming choice. I chose it because all of the other approximate inference methods have a function with that exact name, allowing for easy plug-in replacement of approximate inference methods in experiments.

@CBird210 if you call the all_sample_eval function, specifying the parameter "Nsamples", you will get a vector of Nsamples different predictions from the model.

CBird210 commented 4 years ago

Hi,

Thank you so much for getting back to me so quickly!

I noticed that get_weight_samples also seems to give me the exact same numbers if I train the same network twice using Bayes By Backprop. This is just confusing me as it uses the function sample_weights which looks like it should be giving different answers each time. I’m sorry if this is a mistake on my end, could you help me with some clarification?

all_sample_eval looks like it is doing exactly what I needed. However, I noticed that when I use all_sample_eval and just specify Nsamples, the MC Dropout code gives me results over a group of 16 numbers in MNIST while Bayes By Backprop gives me results over a group of 100 numbers in MNIST. Do have an idea of how I could get results from all_sample_eval for the two methods on the same group of data (I’m trying to do a direct comparison of the posterior predictive distribution computed by both)?

Also, when trying to draw parallels between the code and the source material, I’m having a little trouble with parts of the Bayes by Backprop paper. Could you maybe point me in the direction of where in the code steps 4-7 of their algorithm (in section 3.2) are taking place? Once again, I’m new to Python so apologies if this is really obvious.

Thanks again for your help!

JavierAntoran commented 4 years ago

Hi,

I noticed that get_weight_samples also seems to give me the exact same numbers if I train the same network twice using Bayes By Backprop.

This should not happen. You probably have fixed some random seed in your code or you may be mistakenly loading the same saved model for both runs?

I noticed that when I use all_sample_eval and just specify Nsamples, the MC Dropout code gives me results over a group of 16 numbers in MNIST while Bayes By Backprop gives me results over a group of 100 numbers in MNIST.

Nsamples controls how many MonteCarlo sampled are drawn when approximating the posterior predictive. In order to control which data is being evaluated, you need to ensure that your inputs (x, y) are the same. From your comment, its sounds like you are running different batch sizes.

Could you maybe point me in the direction of where in the code steps 4-7 of their algorithm (in section 3.2) are taking place?

Sure. Note that step 4 is written in a bit of a strange way in the paper. For me, that step is more clear in equation 8. In our code, that occurs in lines 198-208

for i in range(samples):
        out, tlqw, tlpw = self.model(x, sample=True)
        mlpdw_i = F.cross_entropy(out, y, reduction='sum')
        Edkl_i = (tlqw - tlpw) / self.Nbatches
        mlpdw_cum = mlpdw_cum + mlpdw_i
        Edkl_cum = Edkl_cum + Edkl_i

mlpdw = mlpdw_cum / samples
Edkl = Edkl_cum / samples

loss = Edkl + mlpdw

Note that there is a sign discrepancy between their algorithm and our optimisation as Pytorch minimises a loss as opposed to maximising a value function.

steps 5-7 occur through automatic differentiation with:

loss.backward()
self.optimizer.step()

Hope this helps! Javier

CBird210 commented 3 years ago

Sorry for late reply! This was very useful - thank you!

JavierAntoran / Bayesian-Neural-Networks

Posterior Predictive Distribution #13