AntixK / PyTorch-VAE

A Collection of Variational Autoencoders (VAE) in PyTorch.
Apache License 2.0
6.62k stars 1.06k forks source link

KL Weight #35

Closed AlejandroTL closed 3 years ago

AlejandroTL commented 3 years ago

Hi!

Maybe it's a silly question but why do you use a KL Weight term? I understand that it's the percentage that a batch is over the total dataset. For instance, if there are 100 observations and the batch size is 10, the kl_weight should be 0.1, but why do you use it? I've seen some other implementations and doesn't find it. I'm sure there's a reason but I cannot find why weight just the KL Divergence and no the reconstruction loss.

imagen

Thank you so much! :)

loodvn commented 3 years ago

From https://github.com/AntixK/PyTorch-VAE/issues/11 (closed): "It is just the bias correction term for accounting for the minibatch. When small batch-sizes are used, it can lead to a large variance in the KLD value. But it should work without that kld_weight term too."

I was also wondering when I first saw it. Maybe a little FAQ in the README would be helpful since there seem to be several issues referencing this.

simonhessner commented 3 years ago

Related question: https://github.com/AntixK/PyTorch-VAE/issues/40 Should the dimension of the input and the latent vector play a role instead / as well?