poor performance on example

poolio commented 9 years ago

Running the example command from the README: python examples/iw_vae.py -eq_samples 1 -iw_samples 1 -lr 0.001 -nhidden 500 -nlatent 100 -nonlin_dec very_leaky_rectify -nonlin_enc rectify yields substantially worse performance than the plot:

Epoch=9990     Time=2.96       LR=0.00100      E_qsamples=1    IVAEsamples=1   
TRAIN:          Cost=-92.92944  logq(z|x)=-118.74120  logp(z)=-141.49203      logp(x|z)=-70.17860     
EVAL-L1:        Cost=-99.33702  logq(z|x)=-118.58514    logp(z)=-141.30058    logp(x|z)=-76.62159 
EVAL-L5000:     Cost=-92.42599  logq(z|x)=-118.61915    logp(z)=-141.30986   logp(x|z)=-76.66057

Can anyone replicate those results using the current codebase? Or has something changed that would result in such a large performance drop?

skaae commented 9 years ago

Hi Ben,

Thanks for reporting this. It seems that the default value for dataset was changed from binarized to real in https://github.com/casperkaae/parmesan/commit/5638eea86e2fcf0aa5c094c07808ee3aa68b4d19. real is supposed to bernoulli sample the dataset after each epoch but it dosesn't. I think that is a bug (https://github.com/casperkaae/parmesan/commit/657dd395d9a93f3cfd60c9d83416d5e04f470fd3#diff-6044ceb81b54b92878c23ddb97475be3 seems to have introduced that bug).

I think you can improve the results by changing dataset to 'binarized' which will sample the MNIST dataset after each epoch. Alternatively you can sample the dataset once by calling bernoullisample before training. I think resampling after each epoch gives you a few nats.

Casper, who wrote the example, is on holiday but he'll fix the code when he returns later this week. Until then i hope changing dataset to binarized improve the results.

@casperkaae: The comment in line https://github.com/casperkaae/parmesan/blob/master/examples/iw_vae.py#L107 seems to be wrong? + using real does not resample the dataset? Is this also a problem in the other example files?

casperkaae commented 9 years ago

@poolio

Thanks for reporting this. As @skaae points out there have been introduced a bug in the sampling procedure. Secondly I also used learning rate annealing to squeeze out the last percentages, however the LL_5000 should be at around ≈-86-85 after around 1000 epochs without annealing.

It should be fixed by #19 - however I have not tested the performance yet. It's will be much appreciated if you have time for running the example again and report the performance. I'll also test it as soon as possible.

casperkaae commented 9 years ago

I'll just run some tests to get it to work - will report back when it is done

casperkaae commented 9 years ago

I've updated the example code now in PR #20

batch normalization (default=true)
changed hyper params
fixed sampling error

after 650 epochs the LL_500 is -86.93621 which is pretty close to the results on the frontpage. python iw_vae.py -nonlin_enc rectify -nonlin_dec very_leaky_rectify -batch_size 250 -eval_epoch 50

output: Epoch=650 Time=3.58 LR=0.00100 E_qsamples=1 IVAEsamples=1 TRAIN: Cost=-91.62148 logq(z|x)=-114.92097 logp(z)=-141.96342 logp(x|z)=-64.57903 EVAL L1:Cost=-91.03194 logq(z|x)=-114.77290 logp(z)=-141.64285 logp(x|z)=-64.16198 EVAL-L5000: Cost=-86.93621 logq(z|x)=-114.76830 logp(z)=-141.62617 logp(x|z)=-64.23051

casperkaae / parmesan

poor performance on example #18