How to choose beta value proper way?

This paper has an excellent overview of what the beta parameter is doing: https://arxiv.org/abs/1804.03599

To summarize, larger beta will result in a more disentangled latent representation but lower-fidelity reconstructions. Smaller beta will not impose disentangling as much, allowing for higher-fidelity reconstructions. At beta = 1, the B-VAE is equivalent to a plain VAE, so it should is usually set to a value greater than one.

Determining the proper beta depends on the problem and your goals. You can try several values for beta with your data, and you can create a custom training regimen that changes beta over time. This implementation assumes a constant beta, but you can rebuild the model with a different beta during training.

alecGraves / BVAE-tf

How to choose beta value proper way? #2