casperkaae / parmesan

Variational and semi-supervised neural network toppings for Lasagne
Other
208 stars 31 forks source link

Analytic KL term, misc example clean up #13

Closed wuaalb closed 9 years ago

wuaalb commented 9 years ago
casperkaae commented 9 years ago

Thanks a lot. Just a few comments 1) Have you checket that it still converges with the new KL calculation? 2) In vae_vanilla.py: Can you add a reference to Appendix B in kingma so people can find where the exact KL calculation comes from? 3) I think kl_normal2_stdnormal should be moved to the vae_vanilla.py example since it only applies here?

wuaalb commented 9 years ago

Thanks for reviewing.

On my machine with the default configuration I get:

With analytic_kl_term = False

*Epoch: 999 Time: 2.70  LR: 0.00500 LL Train: -152.035  LL test: -152.866

With analytic_kl_term = True

*Epoch: 999 Time: 2.59  LR: 0.00500 LL Train: -140.083  LL test: -141.687

I don't mind changing the other stuff you mentioned. However, I think it would be more convenient to have stuff like this in the library itself rather than having to repeat it for every script that uses this kind of model (also helps to make examples smaller).

casperkaae commented 9 years ago

Its fine to keep the KL term in the library it self - however, can you add a doc string stating that it is the analytical solution to a gaussian with standard normal prior?

Its good that the KL term is better, however the results are not too good though - I think I choose some non-optimal hyperparams :). Can i get you to change the the hidden-layer-size=500, latent_size=100, lr=0.001 and batch_size=128. I these should work better. Then we can merge everything in one go.

thanks for contributing!

wuaalb commented 9 years ago

Ok, made those changes. Didn't wait for full 1000 epochs yet, but those defaults seem better.

casperkaae commented 9 years ago

great -thanks a lot.

The LL should get up to around ≈-100ish

I'll merge merge now