Closed arunraman closed 3 years ago
are you using docker? there is a fix here https://github.com/NVlabs/stylegan2-ada/pull/51 if not - I have a branch using tensorflow-2 compatibiility mode (it includes other branches that have been cherry picked) https://github.com/johndpope/stylegan2-ada/tree/digressions
there's also some options to tinker with Default GPU-based configs: added configs to maximize GPU usage for 11GB. 24GB, and 48GB cards (use 11GB for 16GB cards)
Closing this as it's obsolete. I Will try the PyTorch port when it comes out.
I trained the stylegan2-ada in the Nvidia A100 with the follow in cuda version
NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0
Even though I was able to train the model, it was very slow. I was only able to get 45 -60 secs/kimg and was averaging around 170-200 secs/tick whereas in stylegan2 I was able to get this more efficient with 16.6 secs/kimg and 68.4 secs/tick. I am trying to understand what's causing this delay here.
Also in the A100, with stylegan2-ada, even though I changed the batch size from 64 to 128 in the train.py by hardcoding it on this line,
args.minibatch_size = 128
I was not able to use the entire 40Gb of memory. The GPU memory utilization gets capped at 18Gb and I am not able to push this further even though I have another 12Gb. What parameter other than batch size should I change to fix this?