autonomousvision / stylegan-xl

[SIGGRAPH'22] StyleGAN-XL: Scaling StyleGAN to Large Diverse Datasets
MIT License
961 stars 113 forks source link

Unbalanced GPU memory usage #94

Open Michaelsqj opened 1 year ago

Michaelsqj commented 1 year ago

Hi! I found that GPU memory consumption is highly unbalanced between GPU0 and the rest of GPUs. Here's the command I used to train on imagenet with resolution 128.

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \ python train.py \ --outdir=/storage/guangrun/qijia_3d_model/stylegan-xl/finetune128/ \ --cfg=stylegan3-t \ --data=/datasets/guangrun/qijia_3d_model/imagenet/stylegan_xl/imagenet_sub_seg128.zip \ --gpus=8 \ --batch=32 \ --mirror=1 \ --snap 10 \ --batch-gpu 4 \ --kimg 10000 \ --cond True \ --superres \ --up_factor 2 \ --head_layers 7 \ --path_stem /scratch/local/ssd/guangrun/qijia_3d_model/stylegan_xl/imagenet64.pkl \ --resume /scratch/local/ssd/guangrun/qijia_3d_model/stylegan_xl/imagenet128.pkl

As you can see, the GPU0 only consumes much less memory than rest of the GPUs. May I ask what caused such imbalance and what's the normal memory consumption is when training at 128 resolution with the settings above?

image

Michaelsqj commented 1 year ago

However, when I set batch-gpu=8, gpus=8, batch=64, the GPU memory consumption reduced. It's so weird, I'm wondering if someone might know any clue about this? image