Rayhane-mamah / Efficient-VDVAE

Official Pytorch and JAX implementation of "Efficient-VDVAE: Less is more"
https://arxiv.org/abs/2203.13751
MIT License
188 stars 21 forks source link

pytorch generation 1024 celeba_hq #12

Closed miaoYuanyuan closed 1 year ago

miaoYuanyuan commented 1 year ago

I check the download log, the train time reconstruction image is very good, but why i generate is very poor.

Rayhane-mamah commented 1 year ago

Hello @miaoYuanyuan and thank you for your interest to our work :)

To make sure I understand your question correctly:

Reconstructed samples from the posterior $q_{\phi}(z|x)$ should always be quasi-perfect for a VAE (source).

That is totally normal for this model (check page 24 of our paper). We expect this issue to come down to the fact that for the higher resolution of 1024x1024, the model is not deep/complex enough to capture the data distribution well. As a result, sampling from the prior distribution $p_{\theta}(z)$ during inference doesn't give good samples.

For this model (Efficient-VDVAE), if you want to have decent samples generated by random sampling from the prior, we suggest using the lower version datasets (for example CelebAHQ 256).

Hope this answers the question! :) Rayhane.

miaoYuanyuan commented 1 year ago

Thank you for your reply ! @Rayhane-mamah I generate for the resolution of 256x256, that's good.