Closed wuaalb closed 9 years ago
After some training default iw_vae_normflow.py
gets
*Epoch=1000 Time=7.87 LR=0.00100 eq_samples=1 iw_samples=1 nflows=5
TRAIN: Cost=-91.19209 logqK(zK|x)=-116.52648 = [logq0(z0|x)=-116.33376 - sum logdet J=0.19272] logp(zK)=-141.95990 logp(x|zK)=-65.75868
EVAL-L1: Cost=-90.62387 logqK(zK|x)=-116.30897 = [logq0(z0|x)=-116.11150 - sum logdet J=0.19747] logp(zK)=-141.43175 logp(x|zK)=-65.50109
EVAL-L5000: Cost=-86.20597 logqK(zK|x)=-116.27862 = [logq0(z0|x)=-116.08018 - sum logdet J=0.19843] logp(zK)=-141.38884 logp(x|zK)=-65.48415
Not sure how this compares to iw_vae.py
..
Thanks. I'll try to review your changes tomorrow.
wrt to the performance I made som preliminary test that showed a small performance gain using normalizing flows. The performance you report seems to be in the right ball park. However I think we need to increase the number of flow transformations (maybe to 40 or 80 as in the paper) before we see a substantial performance gain.
as a baseline performance I get around LL_5000≈-85 after 10000 epochs using a VAE with
Looks good to me @wuaalb
Just two suggestions / questions: 1) The performance you report was run with the changes you made to the code right ? 2) Can you change the default hyperparams of the iw_vae_normflow.py to the same values as used in iw_vae.py?
Thanks for reviewing.
1) Yes those results are after applying these changes, and with the default hyper-parameters (but didn't wait for full 10k epochs). Before making the changes the EVAL-L5000 was ever-decreasing (I ran it once with non-default settings and it was at -65 at epoch 800 when I stopped it). 2) I think this commit doesn't change the default hyper-parameters and they were already identical?
It would be interesting to see if using many more transformations would help. I'm just worried it will take a long time to train and that maybe the current default hyper-parameters are quite different from the paper's (much higher learning rate, batch normalization, rectifier non-linearities, ..; although what the paper uses exactly for the MNIST results isn't totally clear to me).
Did you by chance happen to do the experiment in section 6.1 (figure 3) of the paper? I think it would be a nice way to ensure normalizing flows are implemented correctly, but I'm not completely sure how it is done..
@wuaalb
Ah yes, the HP's was already changed to the same values :). I've looked at the experiment in section 6.1, but I couldn't quite figure out how they did it. I'll try to write an email to the author and ask how they did.
I think we should just add a warning in the beginning of the example that it is work in progress / not tested completely - and then just merge the changes. I'll hopefully be able to run some comparison tests on the implementation before to long.
I'll run default iw_vae.py
for ~1000 epochs now, just to have an idea (my guess is results will be almost identical).
Anyways, I think merging now is a good idea. At the very least it should be a step in the right direction.
FWIW, I had a half-assed go at the section 6.1 experiment once; I think I generated samples from q0(z) = N(z; mu, sigma^2 I)
, with mu
and sigma^2
set by hand. Then minimized KL divergence between qK(z)
and the (unnormalized) true distribution q_true(z) = e^{-U(z)}
.
I think the results looked something like this
I'm pretty sure it is more likely that I did something wrong than that there's a problem with NormalizingPlanarFlowLayer
, etc.
I'll wait with the merge until you report back with the performance.
I have opened a new issue (#22) wrt reproducing the results in sec. 6.1 in the "Normalizing flow paper" where we can discuss that further
Results from iw_vae.py
(default settings) after 1000 epochs
*Epoch=1000 Time=6.02 LR=0.00100 eq_samples=1 iw_samples=1
TRAIN: Cost=-91.28976 logq(z|x)=-116.71328 logp(z)=-141.99548 logp(x|z)=-66.00757
EVAL-L1: Cost=-90.70776 logq(z|x)=-116.72040 logp(z)=-141.51874 logp(x|z)=-65.90942
EVAL-L5000: Cost=-86.43427 logq(z|x)=-116.69676 logp(z)=-141.50031 logp(x|z)=-65.92197
So as expected, very slightly worse compared to using normalizing flows (with length 5).
great - at least that indicates that the norm-flow code is working as expected.
I'll merge this now
Thanks
NormalizingPlanarFlowLayer
layer output logdet-Jacobian instead ofpsi_u
so all logic specific to planar type flows is contained in layer and other types of flows can be used more easily