Closed ChengBinJin closed 3 years ago
@anvoynov If I may add to @ChengBinJin's question, could you also provide some insight on the number of GPUs used for BigGAN and ProgGAN experiments? In the paper you state that "all the experiments were performed on the NVIDIA Tesla v100 card", but how many of them in parallel? Because it seems that ProgGAN, especially, needs a lot of VRAM (my quick calculations give about 40 GB for batch_size=10
).
@chi0tzp @anvoynov I tried to use V100 with 16Gb for the config-f of StyleGANv2-1024 resolution. It always shows out of memory. Therefore, I plan to try 512x512 instead of 1024x1024. I will leave feedback after trying it.
We have launched it on a single Tesla V100 with 32GB and batch size 16 (10 for ProgGAN). For the purpose to reduce the memory utilization, consider passing the shift_predictor_size = 256
to force the generated images to be downscaled before the shift prediction.
@anvoynov Thanks for letting us know. Personally I don't have access to a 32GB V100, but to a pair of 16GB V100s. I hope this doesn't affect much the experiment (at least in terms of VRAM usage).
Just to let you know (mainly @ChengBinJin ), the maximum batch size the I could use for training the ProgGAN model with two 16GB V100 cards is 6. In the code, the only model that is parallelized into multiple GPUs is the generator G; I don't know what would happen if I parallelize the other models as well (basically the shift predictor). Apparently, this is not an issue when you have a total VRAM of 32GB, instead of two of 16GB. I suggest that this issue can be closed now (@ChengBinJin's and @anvoynov's call), but I will get back and let you know if I decide to parallelize shift predictor as well.
@anvoynov I want to try training using StyleGANv2 on W space. But it's always out of memory even I tried the different parameters. From your readme, it seems that you successful to run StyleGANv2 of model-f with 1024 resolution.