NVlabs / NVAE

The Official PyTorch Implementation of "NVAE: A Deep Hierarchical Variational Autoencoder" (NeurIPS 2020 spotlight paper)
https://arxiv.org/abs/2007.03898
Other
999 stars 163 forks source link

Are there pretrained models available? #7

Closed morgankohler closed 3 years ago

yaxingwang commented 3 years ago

It is great work. Do authors have plan to release the pre-trained model ? providing the pre-trained model motivates some researchers to explore the property of NVAE like some papers which focus on exploiting the pre-trained StyleGAN (StyleGANv2).

loc-trinh commented 3 years ago

Thank you for the great work! I agree with the comment above, pretrained weights for those without 8 GPUs will be so very helpful to test the generation/interpolation capabilities of NVAE, especially on Celeb 256.

arash-vahdat commented 3 years ago

Thanks for all the support. We are working on this and should have an update on this in a couple of weeks.

Lukelluke commented 3 years ago

I have a pretrained model about 30+ epochs in CelebaA64 , done in single GPU with batch_size = 2 ,and other hyperparameters the same as Dr.arash-vahdat described.

If u need it, i can upload it, and plz tell me which cloud disk is ok? How about BaiDu Cloud Disk?


https://pan.baidu.com/s/1cXSwN8I2lEqOspv02PyGlA
KEY:93bz

loc-trinh commented 3 years ago

Thank you for the offer! I've also managed to train a CelebA64 version for 50 epochs, what BPD did you get? Mine flattens around 2.28 bpd elbo, and would be happy to share that too. I mainly have some GPUs trouble for training CelebA 256x256 XD, the net is hugee.

Lukelluke commented 3 years ago

Mine flattens around 2.28 bpd elbo

hello, @loc-trinh ,

Sorry that i didn't figure out what is BPD elbo, could u please explain briefly? I search and guess it's a proper nouns in image field. Sorry that my major in voice field.

As for CelebA 256*256, I tried to train it within 5 epochs and get surprising result, maybe this u can waiting for Dr.@arash-vahdat's help :) And I'm looking forward to that too.

loc-trinh commented 3 years ago

Hi @Lukelluke, BPD elbo is the value of the evidence lower bounce in variational inference divided by the number of 'dimensions' (the number of pixels per image in this case). You then divide it by log(2) because you want it to be in base 2 I believe.

arash-vahdat commented 3 years ago

We just updated the README with a link to pretrained checkpoints.