Inconsistent use of mean & sum when calculating KL divergence?

IntelLabs / bayesian-torch

A library for Bayesian neural network layers and uncertainty estimation in Deep Learning extending the core of PyTorch

BSD 3-Clause "New" or "Revised" License

518 stars 72 forks source link

Inconsistent use of mean & sum when calculating KL divergence? #36

Open profPlum opened 5 months ago

profPlum commented 5 months ago

There is a mean taken inside BaseVariationalLayer_.kl_div(). But later a sum is used inside get_kl_loss() & when reducing the KL loss of a layer's bias & weights (e.g. inside Conv2dReparameterization.kl_loss()).

I'm wondering if there is mathematical justification for this? Why take the mean of the individual weight KL divergences only to later sum across layers?