Closed wuaalb closed 9 years ago
I think @casperkaae should review this. He is on holiday for ~2 weeks so we have to wait a little for his feedback.
@skaae Sure, no problem.. Thanks!
Small update regarding this, with PR #17 the results are as follows
*Epoch: 999 Time: 14.13 LR: 0.00030 LL Train: -90.872 LL test: -91.336
with real MNIST that's re-binarized and shuffled on each epoch.
*Epoch: 999 Time: 5.22 LR: 0.00030 LL Train: -89.528 LL test: -92.859
with fixed binarized MNIST (like in initial post, but shuffled on each epoch).
For reference,
*Epoch: 999 Time: 9.02 LR: 0.00100 LL Train: -88.558 LL test: -103.643
with old parameters (also fixed binarized MNIST).
thanks
With previous parameters the test ELBO would start going up after approx. 60 epochs. Decreased learning rate and number of hidden units in deterministic layers of encoder/decoder.
Set
analytic_kl_term=True
by default as it seems to improve results and is what the Kingma et al. paper does in its examples.Changed non-linearity
softplus
for deterministic hidden layers. Also triedtanh
andvery_leaky_rectify
, but this seemed to perform best.Results with these settings