KL divergence approx for Linear appears to be wrong

KarenUllrich / Tutorial_BayesianCompressionForDL

A tutorial on "Bayesian Compression for Deep Learning" published at NIPS (2017).

MIT License

203 stars 48 forks source link

Closed gngdb closed 6 years ago

gngdb commented 6 years ago

The Linear layer uses this:

KLD_element = -0.5 * self.weight_logvar + 0.5 * (self.weight_logvar.exp() + self.weight_mu.pow(2)) - 0.5

But the convolutional layers use this:

KLD_element = -self.weight_logvar + 0.5 * (self.weight_logvar.exp().pow(2) + self.weight_mu.pow(2)) - 0.5

The second appears to match equation 8 of the paper, so is it just a mistake in the Linear layer?

gngdb commented 6 years ago

Ah, appears to only be an issue in my fork (of a fork). Should've read the code before opening this issue, sorry!