Should KL loss and reconstruction loss be in the same magnitude?

Hello! I have met KL collapsing problem when training VAE model. The KL loss is like this.

After reading your paper “Ladder Variational Autoencoders”, I decided to use warm up method for KL loss. But I still have some questions. In my task, the scale of reconstruction loss and kl loss is different greatly. The reconstruction loss is about 10^-2~10^-1, but the KL loss is 10^-5~10^-6. The value of mu and log_var are also 10^-5~10^-6 (mu or log_var is calculated by encoder results passing through one FC layer). So I have a question, will the difference between the scale of two kinds of loss strongly influence my training task? Should I do something to increase the value of KL loss before I implement warm up method?

Thank you!

casperkaae / LVAE

Should KL loss and reconstruction loss be in the same magnitude? #2