daib13 / TwoStageVAE

230 stars 33 forks source link

Default setting for reproducing the result in your paper #9

Open mmderakhshani opened 4 years ago

mmderakhshani commented 4 years ago

Hi. I would like to really appreciate your work. I have implemented you paper in PyTorch and now I am trying to reproduce your paper results in (Table 1). May I ask you to tell me what are the default settings for CelebA and Cifar10 with which you used to train the TwoStageVAE? In your paper, you referred us to a paper that introduces some hyperparameter settings, not all of them and I think they are incomplete.

mmderakhshani commented 4 years ago

@daib13. Hi Bin, I would really appreciate it if you could tell me the commands you really used in order to generate table 1 results.

daib13 commented 4 years ago

Hi @mmderakhshani, sorry for the late reply. For celeba, we just use the default setting in the repository. You can run python demo.py --dataset celeba using this code. For cifar10, we use 1000 epochs for the first stage and 2000 epochs for the second stage. But note that the FID for cifar10 is very weird. Saving the real images in jpg files and then reading the jpg files will give different FID than just reading from the original files. Also, according to my experience, using pytorch will also produce different FID than using Tensorflow framework.

mmderakhshani commented 4 years ago

@daib13 Thanks for your reply. Regarding table 2, did you calculate those scores with Resnet architecture or it is similar to table one and calculated using Infogan architecture?

daib13 commented 4 years ago

@mmderakhshani Table 2 is applied on WAE network which is defined in https://github.com/daib13/TwoStageVAE/blob/871862382746d619e7dc7527f3d02f924e1b0af5/network/two_stage_vae_model.py#L194

We exactly follow the training protocol of the WAE paper. You can reproduce the result using the command

python dome.py --dataset celeba --epochs 70 --lr-epochs 30 --epochs2 70 --lr-epochs2 30 --network-structure Wae

To calculate the FID score, we use the standard inception feature for both table 1 and table 2, which is also consistent to most of the previous works. The model is defined in https://github.com/openai/improved-gan/blob/master/inception_score/model.py. You can check how we calculate the FID score in https://github.com/daib13/TwoStageVAE/blob/871862382746d619e7dc7527f3d02f924e1b0af5/fid_score.py#L213

saehoonkim commented 4 years ago

Hello, I really enjoy reading your article. It contains many interesting observations both theoretically and empirically. But, I’m struggling with reproducing FID scores with CIFAR-10. After running the command below, I’ve obtained FID scores on CIFAR-10: 86.1316 (reconstruction), 105.9603 (first stage), and 101.6009 (second stage). After saving the numpy array in JPEG format and reloading them to calculate FID scores, the numbers are 77.3854 (reconstruction), 89.8473 (first stage), 89.1814 (second stage).

python demo.py --dataset cifar10 --epochs 1000 --lr-epochs 300 --epochs2 2000 --lr-epochs2 600
--network-structure Resnet --num-scale 4 --base-dim 32 --latent-dim 64 --gpu 0 --exp-name [EXP-NAME]

I believe that the above configuration is exactly the same with Appendix D, but even after saving and reloading them, the numbers are higher than the ones reported in Table 1.

And, I’ve found that there exists a slight difference between Figure 16 in the arxiv version and the implementation. In the implementation, global averaging pooling was used instead of a flatten layer, which seems to be a minor difference.

Thanks in advance!