NVlabs / NVAE

The Official PyTorch Implementation of "NVAE: A Deep Hierarchical Variational Autoencoder" (NeurIPS 2020 spotlight paper)
https://arxiv.org/abs/2007.03898
Other
999 stars 163 forks source link

CelebA 64 #5

Closed AlexZhurkevich closed 3 years ago

AlexZhurkevich commented 3 years ago

First of all I would like to say thank you for the great VAE implementation. Looking forward to Celeb256 training instructions! I tried preprocessing CelebA64 but got an error when executing create_celeba64_lmdb.py. The way you obtain the dataset (dset.celeba.CelebA) downloads corrupted img_align_celeba.zip, I was able to fix the problem by replacing 'dset.celeba.CelebA' with 'dset.CelebA'. According to the official API this is correct way to do it (https://pytorch.org/docs/stable/torchvision/datasets.html#celeba).

arash-vahdat commented 3 years ago

Did the error happen when you were running the actual training script or when you were running create_celeba64_lmdb.py?

Several users reported issues with celeba_64 download as well. They noticed the issue during training though: https://github.com/NVlabs/NVAE/issues/2

AlexZhurkevich commented 3 years ago

Yes, script, error happened when I ran 'python create_celeba64_lmdb.py --split train --img_path $DATA_DIR/celeba_org --lmdb_path $DATA_DIR/celeba64_lmdb', I simply did not find '.celeba.CelebA' in PyTorch APIs, hence the replacement. I've checked out issue #2 , I dont get any NaNs, training is going as expected for me. For training I used your default command, did not change anything since I have 8 V100 32gb, so no need to reduce batch size. Arash, thanks for the great job that you've already done. I would like to ask you to consider explaining on the main page how different train.py parameters affect each other and/or training and/or results. For example in my case I would like to run NVAE on 512x512 custom dataset, I guess just resolution scale is not enough, I totally understand that a lot of these are hyperparameters, hence its up to us to decide and test, but at least for some of the obvious ones would be nice to have hints. This is one of reasons I am waiting for CelebA HQ 256 training parameters, it would be very beneficial to see the differences with CelebA 64 training. Thanks!

Lukelluke commented 3 years ago

Yes, script, error happened when I ran 'python create_celeba64_lmdb.py --split train --img_path $DATA_DIR/celeba_org --lmdb_path $DATA_DIR/celeba64_lmdb', I simply did not find '.celeba.CelebA' in PyTorch APIs, hence the replacement. I've checked out issue #2 , I dont get any NaNs, training is going as expected for me. For training I used your default command, did not change anything since I have 8 V100 32gb, so no need to reduce batch size. Arash, thanks for the great job that you've already done. I would like to ask you to consider explaining on the main page how different train.py parameters affect each other and/or training and/or results. For example in my case I would like to run NVAE on 512x512 custom dataset, I guess just resolution scale is not enough, I totally understand that a lot of these are hyperparameters, hence its up to us to decide and test, but at least for some of the obvious ones would be nice to have hints. This is one of reasons I am waiting for CelebA HQ 256 training parameters, it would be very beneficial to see the differences with CelebA 64 training. Thanks!

Good Question!

We wondering this too. Thought that i tried HQ dataset 256 * 256 in almost the same hyperparameters, and got good result within 5 epochs. And i was confused that why Dr.@arash-vahdat didn't explain the suggested hyperparameters . :) lol

Expecting Dr.arash-vahdat will have free time to chat with us about this issue. :) 👍