XuezheMax / wolf

Invertible Generative Flows
Apache License 2.0
81 stars 13 forks source link

Number of Epochs #4

Open mehussein opened 2 years ago

mehussein commented 2 years ago

Hi,

I am trying to run the cifar10 example in the README file. The command line arguments there specify 15000 as the number of epochs. How important is it to train the model for that many epochs? In other words, what is the minimum number of epochs to train for and still get reasonable results? Based on the speed I am seeing so far, it would take my system (with a single GPU) at least 7 weeks to finish 15000 epochs.

Thanks!

XuezheMax commented 2 years ago

I trained 15000 epochs to guarantee full convergence. I remembered that 5000 epochs can yield competitive performance.

mehussein commented 2 years ago

Thanks! And, how many GPUs do you recommend?

Also, what do you mean by convergence here? Qualitatively, I can see that the reconstruction quality is good even at the very beginning of training. However, the sample realism is not good even after hundreds of iterations. They start as smooth images with no structure, then they start to have cifar10-like structures, but when you zoom in, they do not look like real objects. Is that expected to improve after thousands of iterations?

Final questions, would it be possible to release the training configurations (epochs, batch size, etc) for the other datasets, please?

Thanks!

XuezheMax commented 2 years ago

Hi, sorry for the late response. When I trained the CIFAR-10 flow, I remembered that I used 2 GPUs (not strong ones, probably TITAN). I think 4 GPUs is enough. For convergence, I mean the BPD score on the validation set stop to decrease. It usually requires at least 5000 epochs to get a reasonable BPD score. For the generated images, the quality will be better. But I need to say that Flow-based models are not as good as GAN models on generating realistic images.

For configurations on other datasets, I will try to find them and share with you. But since it is a work two years ago when I was at CMU, it might take some time to find them.

craymichael commented 9 months ago

Hello, do you recall the train time for ImageNet and how many GPUs were used? The number of epochs would be helpful, too. It seems that each epoch takes around 1 GPU day or so, so 15,000 would be a big ask.