The optimizer used in VAE-GAN?

jessejchuang commented 6 years ago

Hi TA,

HW3-1 VAE uses RMSprop. HW3-2 DCGAN uses Adam. The HW3-3 handout doesn't disclose the optimizer. Which optimizer shall we use in VAE-GAN?

BTW, I use RMSprop currently as the paper. The Dis outputs of x, xp and xtilde converge quickly to 0.33 in few epochs, but they stay at 0.33 and not change anymore. It's supposed all of them should be 0.5? The L2 norm (MSE) is small and round 0.0003~0.0006. The generated picture is still not clear around 20 epochs so far. Could you guess why?

a514514772 commented 6 years ago

Hi @jessejchuang ,

Please use RMSprop.

For the second problem, 0.33 means the discriminator totally can't tell the true and the false, so I guess the discriminator may be not updated correctly ?

Thanks

jessejchuang commented 6 years ago

Hi TA,

The coding and hyperparameters follow the handout except weights_init and KL_loss are from DCGAN/VAE HW. Any comments for these two?

I can see the discrimiantor update but quickly to 0.33 at epoch 2 as the log below. While L2 norm is becoming small, I think Dis starts confused between real and fake x. However, it happens too quickly at epoch 2 and no idea why it stays at 0.33.

BTW, the paper uses 3e-4 as learning rate. The handout uses 3e-4 and 3e-5 for Enc/Dec and Dis respectively. I tried to use 3e-4 for Dis as well but it couldn't converge. I abandon this change finally.

[1] Reuse HW2 DCGAN's weights_init (it lacks init of fc):

[2] Reuse HW1 VAE's KL loss (only differs in normalization, torch.numel use batch*hidden_code_size)

[3] Log

[0/50][0/547] Loss_Dis:2.06(x:0.50, xp:0.49, x~:0.49) Loss_Dec:-0.77(xp:0.40, x~:0.40) Loss_Enc:1.9545(KL:0.1268, l2norm:0.0518)
[0/50][1/547] Loss_Dis:6.97(x:0.93, xp:0.97, x~:0.96) Loss_Dec:0.42(xp:0.02, x~:0.02) Loss_Enc:44.1637(KL:2.9380, l2norm:0.0930)
[0/50][2/547] Loss_Dis:10.26(x:0.60, xp:0.99, x~:0.99) Loss_Dec:1.82(xp:0.00, x~:0.00) Loss_Enc:25.8804(KL:1.7010, l2norm:0.3647)
[0/50][3/547] Loss_Dis:4.28(x:0.91, xp:0.90, x~:0.84) Loss_Dec:1.43(xp:0.00, x~:0.01) Loss_Enc:334.5617(KL:22.2849, l2norm:0.2878)
[0/50][4/547] Loss_Dis:2.01(x:0.78, xp:0.60, x~:0.56) Loss_Dec:2.01(xp:0.00, x~:0.02) Loss_Enc:4.7125(KL:0.2871, l2norm:0.4058)
[0/50][5/547] Loss_Dis:0.99(x:0.45, xp:0.00, x~:0.10) Loss_Dec:0.34(xp:0.01, x~:0.11) Loss_Enc:2.3156(KL:0.1478, l2norm:0.0979)
[0/50][6/547] Loss_Dis:0.69(x:0.62, xp:0.02, x~:0.12) Loss_Dec:0.22(xp:0.02, x~:0.12) Loss_Enc:1.0959(KL:0.0679, l2norm:0.0767)
[0/50][7/547] Loss_Dis:3.07(x:0.70, xp:0.77, x~:0.68) Loss_Dec:1.38(xp:0.00, x~:0.13) Loss_Enc:1.2154(KL:0.0604, l2norm:0.3088)
[0/50][8/547] Loss_Dis:1.12(x:0.45, xp:0.00, x~:0.10) Loss_Dec:0.39(xp:0.00, x~:0.11) Loss_Enc:0.8845(KL:0.0520, l2norm:0.1047)
[0/50][9/547] Loss_Dis:8.67(x:0.52, xp:0.99, x~:0.81) Loss_Dec:0.21(xp:0.01, x~:0.22) Loss_Enc:0.9489(KL:0.0565, l2norm:0.1007)
[0/50][10/547] Loss_Dis:0.66(x:0.82, xp:0.14, x~:0.24) Loss_Dec:0.41(xp:0.02, x~:0.11) Loss_Enc:0.5703(KL:0.0304, l2norm:0.1150)
[0/50][11/547] Loss_Dis:2.70(x:0.81, xp:0.73, x~:0.67) Loss_Dec:1.00(xp:0.00, x~:0.12) Loss_Enc:0.7979(KL:0.0378, l2norm:0.2307)
[0/50][12/547] Loss_Dis:0.61(x:0.83, xp:0.05, x~:0.25) Loss_Dec:0.10(xp:0.03, x~:0.23) Loss_Enc:1.0062(KL:0.0613, l2norm:0.0862)
[0/50][13/547] Loss_Dis:2.67(x:0.88, xp:0.71, x~:0.72) Loss_Dec:1.61(xp:0.00, x~:0.05) Loss_Enc:0.6564(KL:0.0215, l2norm:0.3343)
[0/50][14/547] Loss_Dis:0.53(x:0.92, xp:0.02, x~:0.30) Loss_Dec:0.19(xp:0.01, x~:0.28) Loss_Enc:1.9868(KL:0.1244, l2norm:0.1211)
[0/50][15/547] Loss_Dis:1.28(x:0.93, xp:0.32, x~:0.52) Loss_Dec:0.49(xp:0.01, x~:0.22) Loss_Enc:0.8689(KL:0.0470, l2norm:0.1637)
...
[0/50][546/547] Loss_Dis:2.05(x:0.45, xp:0.46, x~:0.46) Loss_Dec:-1.11(xp:0.43, x~:0.43) Loss_Enc:0.1581(KL:0.0102, l2norm:0.0053)
...
[1/50][546/547] Loss_Dis:1.93(x:0.37, xp:0.37, x~:0.37) Loss_Dec:-0.87(xp:0.35, x~:0.36) Loss_Enc:0.0924(KL:0.0060, l2norm:0.0019)
...
[2/50][546/547] Loss_Dis:1.91(x:0.33, xp:0.33, x~:0.33) Loss_Dec:-0.79(xp:0.33, x~:0.33) Loss_Enc:0.0635(KL:0.0042, l2norm:0.0010)
...
[3/50][546/547] Loss_Dis:1.92(x:0.39, xp:0.36, x~:0.40) Loss_Dec:-0.91(xp:0.35, x~:0.39) Loss_Enc:0.1040(KL:0.0067, l2norm:0.0042)
...
[4/50][546/547] Loss_Dis:1.94(x:0.32, xp:0.33, x~:0.33) Loss_Dec:-0.79(xp:0.33, x~:0.33) Loss_Enc:0.0327(KL:0.0021, l2norm:0.0014)
...
[5/50][546/547] Loss_Dis:1.92(x:0.33, xp:0.34, x~:0.34) Loss_Dec:-0.81(xp:0.33, x~:0.33) Loss_Enc:0.0437(KL:0.0029, l2norm:0.0007)
...
[6/50][546/547] Loss_Dis:1.92(x:0.33, xp:0.34, x~:0.34) Loss_Dec:-0.80(xp:0.33, x~:0.33) Loss_Enc:0.0042(KL:0.0003, l2norm:0.0004)
...
[7/50][546/547] Loss_Dis:1.91(x:0.33, xp:0.33, x~:0.33) Loss_Dec:-0.81(xp:0.33, x~:0.33) Loss_Enc:0.0058(KL:0.0004, l2norm:0.0002)
...
[8/50][546/547] Loss_Dis:1.91(x:0.33, xp:0.33, x~:0.33) Loss_Dec:-0.81(xp:0.33, x~:0.33) Loss_Enc:0.0009(KL:0.0000, l2norm:0.0003)
...
[9/50][546/547] Loss_Dis:1.91(x:0.33, xp:0.33, x~:0.33) Loss_Dec:-0.81(xp:0.33, x~:0.33) Loss_Enc:0.0007(KL:0.0000, l2norm:0.0004)
...
[10/50][546/547] Loss_Dis:1.91(x:0.33, xp:0.33, x~:0.33) Loss_Dec:-0.81(xp:0.33, x~:0.33) Loss_Enc:0.0012(KL:0.0000, l2norm:0.0005)

a514514772 commented 6 years ago

Please refer to this thread:

https://github.com/2017-fall-DL-training-program/VAE-GAN-and-VAE-GAN/issues/23#issuecomment-350238559

jessejchuang commented 6 years ago

It's fine. Let's discuss in that thread.

2017-fall-DL-training-program / VAE-GAN-and-VAE-GAN

The optimizer used in VAE-GAN? #21