akanimax / pro_gan_pytorch

Unofficial PyTorch implementation of the paper titled "Progressive growing of GANs for improved Quality, Stability, and Variation"
MIT License
536 stars 100 forks source link

Which size GPU for full resolution training? #29

Closed robbiebarrat closed 2 years ago

robbiebarrat commented 5 years ago

Hi - sorry if this is a broad question but I've looked around the repo and can't find the answer anywhere..

Assuming I only have 1 Titan XP gpu (12gb) - can i train the full resolution of this model? I've been trying to modify the official tensorflow implementation of progressive growing of GANs to only use 12gb; but i have to seriously cut back on the number of filters / batch size to make it work.

Thanks so much.

akanimax commented 5 years ago

Hi @robbiebarrat,

Yes, I believe you can train the full model given your GPU. I have been able to train the whole model (same number of channels as the original paper and 1024 x 1024 resolution) on my GTX 1070 (8GB) gpu.

The way you could optimize this is by starting the code at each depth and setting the most optimal batch size for each; prior to starting the actual training. By my experience, you should be able to fit a Batch size of may-be 2 for the highest resolution. But, you'll have to try. Once you have this, then you could proceed with setting the schedule for the progressive growing scheme.

Hope this helps.

Please feel free to let me know if you have any more questions.

Cheers :beers:! @akanimax

robbiebarrat commented 5 years ago

@akanimax this helps a lot !! and seriously thank you for such a fast reply.

I have been struggling with the official tensorflow implementation because even though i have modified it to generate 512x1024 images instead of 1024x1024; it still wont fit into my gpu. I think that nvidia made it on purpose to only fit into the super expensive 16gb gpus ;)

One last question - how long can i expect the training to take? how long did it take you to train the celebA model in the examples?

Cheers!

akanimax commented 5 years ago

Well, I don't think nvidia would do something like that :smile:. I believe it's because they have some lower level optimization involved there. TBH, it was a little difficult for me to use the official pro_gan code too. The StyleGAN code is amazing btw (very easy to reason with and use per se).

In terms of training time. Well, I can't tell... in fact no one can. Especially with new datasets. You would probably spend a long time finding the perfect schedule for the progressive growing and even with the training, I feel you should expect weeks of training.

Btw, if relevant for you, -> you could give the MSG-GAN a try: https://github.com/akanimax/BMSG-GAN. You can read the paper https://arxiv.org/abs/1903.06048 for more information. But with this, you'd probably have to reduce the batch size to something like 2 or 4 maybe for your GPU. But, MSG-GAN is a lot more hasslefree than ProGAN. :smiley:!

Cheers :beers:! @akanimax

ss32 commented 4 years ago

I have been able to train the whole model (same number of channels as the original paper and 1024 x 1024 resolution) on my GTX 1070 (8GB) gpu.

In terms of training time. Well, I can't tell... in fact no one can. Especially with new datasets. You would probably spend a long time finding the perfect schedule for the progressive growing and even with the training, I feel you should expect weeks of training.

How long did it take you to train with the original dataset? I've been working with custom datasets on the order of 125k images and I've found that even a week of training isn't enough; I can't get good results even at lower resolutions.

benx13 commented 1 year ago

Hi @robbiebarrat,

Yes, I believe you can train the full model given your GPU. I have been able to train the whole model (same number of channels as the original paper and 1024 x 1024 resolution) on my GTX 1070 (8GB) gpu.

The way you could optimize this is by starting the code at each depth and setting the most optimal batch size for each; prior to starting the actual training. By my experience, you should be able to fit a Batch size of may-be 2 for the highest resolution. But, you'll have to try. Once you have this, then you could proceed with setting the schedule for the progressive growing scheme.

Hope this helps.

Please feel free to let me know if you have any more questions.

Cheers 🍻! @akanimax

Hi, thanks for the package, can you provide more details on how you were able to train a 1070 on full resolution. I have a 1070 not plugged to display can't seem to surpass 512x512 batch=1 num_channels=1 running ubuntu 22.04 torch 11.10.0