Fixes problem with Normalizing Flows

wuaalb commented 9 years ago

Fixes problem where p(z0) was used instead of p(zK), see eq. 15 of Rezende NF paper
Made NormalizingPlanarFlowLayer layer output logdet-Jacobian instead of psi_u so all logic specific to planar type flows is contained in layer and other types of flows can be used more easily
Various small changes to code, comments and logging for clarity

wuaalb commented 9 years ago

After some training default iw_vae_normflow.py gets

*Epoch=1000     Time=7.87       LR=0.00100      eq_samples=1    iw_samples=1    nflows=5
  TRAIN:        Cost=-91.19209  logqK(zK|x)=-116.52648  = [logq0(z0|x)=-116.33376 - sum logdet J=0.19272]       logp(zK)=-141.95990     logp(x|zK)=-65.75868
  EVAL-L1:      Cost=-90.62387  logqK(zK|x)=-116.30897  = [logq0(z0|x)=-116.11150 - sum logdet J=0.19747]       logp(zK)=-141.43175     logp(x|zK)=-65.50109
  EVAL-L5000:   Cost=-86.20597  logqK(zK|x)=-116.27862  = [logq0(z0|x)=-116.08018 - sum logdet J=0.19843]       logp(zK)=-141.38884     logp(x|zK)=-65.48415

Not sure how this compares to iw_vae.py..

casperkaae commented 9 years ago

Thanks. I'll try to review your changes tomorrow.

wrt to the performance I made som preliminary test that showed a small performance gain using normalizing flows. The performance you report seems to be in the right ball park. However I think we need to increase the number of flow transformations (maybe to 40 or 80 as in the paper) before we see a substantial performance gain.

as a baseline performance I get around LL_5000≈-85 after 10000 epochs using a VAE with

500 hidden units,
100 latent units,
rectifiers in the encoder,
very_leaky_rectifiers in the decoder
batch normlization

casperkaae commented 9 years ago

Looks good to me @wuaalb

Just two suggestions / questions: 1) The performance you report was run with the changes you made to the code right ? 2) Can you change the default hyperparams of the iw_vae_normflow.py to the same values as used in iw_vae.py?

wuaalb commented 9 years ago

Thanks for reviewing.

1) Yes those results are after applying these changes, and with the default hyper-parameters (but didn't wait for full 10k epochs). Before making the changes the EVAL-L5000 was ever-decreasing (I ran it once with non-default settings and it was at -65 at epoch 800 when I stopped it). 2) I think this commit doesn't change the default hyper-parameters and they were already identical?

It would be interesting to see if using many more transformations would help. I'm just worried it will take a long time to train and that maybe the current default hyper-parameters are quite different from the paper's (much higher learning rate, batch normalization, rectifier non-linearities, ..; although what the paper uses exactly for the MNIST results isn't totally clear to me).

Did you by chance happen to do the experiment in section 6.1 (figure 3) of the paper? I think it would be a nice way to ensure normalizing flows are implemented correctly, but I'm not completely sure how it is done..

casperkaae commented 9 years ago

@wuaalb

Ah yes, the HP's was already changed to the same values :). I've looked at the experiment in section 6.1, but I couldn't quite figure out how they did it. I'll try to write an email to the author and ask how they did.

I think we should just add a warning in the beginning of the example that it is work in progress / not tested completely - and then just merge the changes. I'll hopefully be able to run some comparison tests on the implementation before to long.

wuaalb commented 9 years ago

I'll run default iw_vae.py for ~1000 epochs now, just to have an idea (my guess is results will be almost identical).

Anyways, I think merging now is a good idea. At the very least it should be a step in the right direction.

FWIW, I had a half-assed go at the section 6.1 experiment once; I think I generated samples from q0(z) = N(z; mu, sigma^2 I), with mu and sigma^2 set by hand. Then minimized KL divergence between qK(z) and the (unnormalized) true distribution q_true(z) = e^{-U(z)}.

I think the results looked something like this _nf_test

I'm pretty sure it is more likely that I did something wrong than that there's a problem with NormalizingPlanarFlowLayer, etc.

casperkaae commented 9 years ago

I'll wait with the merge until you report back with the performance.

I have opened a new issue (#22) wrt reproducing the results in sec. 6.1 in the "Normalizing flow paper" where we can discuss that further

wuaalb commented 9 years ago

Results from iw_vae.py (default settings) after 1000 epochs

*Epoch=1000     Time=6.02       LR=0.00100      eq_samples=1    iw_samples=1
  TRAIN:        Cost=-91.28976  logq(z|x)=-116.71328    logp(z)=-141.99548      logp(x|z)=-66.00757
  EVAL-L1:      Cost=-90.70776  logq(z|x)=-116.72040    logp(z)=-141.51874      logp(x|z)=-65.90942
  EVAL-L5000:   Cost=-86.43427  logq(z|x)=-116.69676    logp(z)=-141.50031      logp(x|z)=-65.92197

So as expected, very slightly worse compared to using normalizing flows (with length 5).

casperkaae commented 9 years ago

great - at least that indicates that the norm-flow code is working as expected.

I'll merge this now

Thanks

casperkaae / parmesan

Fixes problem with Normalizing Flows #21