casperkaae / parmesan

Variational and semi-supervised neural network toppings for Lasagne
Other
208 stars 31 forks source link

Better default hyper-parameters for vae_vanilla example #16

Closed wuaalb closed 8 years ago

wuaalb commented 8 years ago

With previous parameters the test ELBO would start going up after approx. 60 epochs. Decreased learning rate and number of hidden units in deterministic layers of encoder/decoder.

Set analytic_kl_term=True by default as it seems to improve results and is what the Kingma et al. paper does in its examples.

Changed non-linearity softplus for deterministic hidden layers. Also tried tanh and very_leaky_rectify, but this seemed to perform best.

Results with these settings

*Epoch: 999 Time: 9.03 LR: 0.00030 LL Train: -90.331 LL test: -93.592
skaae commented 8 years ago

I think @casperkaae should review this. He is on holiday for ~2 weeks so we have to wait a little for his feedback.

wuaalb commented 8 years ago

@skaae Sure, no problem.. Thanks!

Small update regarding this, with PR #17 the results are as follows

*Epoch: 999 Time: 14.13 LR: 0.00030 LL Train: -90.872   LL test: -91.336

with real MNIST that's re-binarized and shuffled on each epoch.

*Epoch: 999 Time: 5.22  LR: 0.00030 LL Train: -89.528   LL test: -92.859

with fixed binarized MNIST (like in initial post, but shuffled on each epoch).

For reference,

*Epoch: 999 Time: 9.02  LR: 0.00100 LL Train: -88.558   LL test: -103.643

with old parameters (also fixed binarized MNIST).

casperkaae commented 8 years ago

thanks