Open samadejacobs opened 6 years ago
Not immediately evident based on just the plot--although one big difference is the number of latents. Initialization can also play a role. Note that the KL loss going up simply means that more information is being encoded in the latents; as long as the error is going down more than the KL loss goes up, then the VAE is learning the distribution. In fact, this is often the behavior you want from a VAE, since you expect the VAE to encode more in its latent variables over time.
Thank you the nice tutorial and supporting code. I made a plot (attached) of KL Loss vs iterations of your implementation and that of Keras (blog, code). Could you please provide insight as to why the KL loss for your implementation is going up?